The common element is that they're written with the most obvious version of the ...

The common element is that they're written with the most obvious version of the code, while the ones in the faster bucket are either explicitly vectorized or written in non-obvious ways to help the compiler auto-vectorize. For example, consider the Objective C version of the loop in leibniz.m:

  for (long i = 2; i <= rounds + 2; i++) {
      x *= -1.0;
      pi += x / (2.0 * i - 1.0);
  }

With my older version of Clang, the resulting assembly at -O3 isn't vectorized. Now look at the C version in leibniz.c:

  rounds += 2u; // do this outside the loop
  for (unsigned i=2u; i < rounds; ++i) // use ++i instead of i++
  {
      double x = -1.0 + 2.0 * (i & 0x1); // allows vectorization
      pi += (x / (2u * i - 1u)); // double / unsigned = double
  }

This produces vectorized code when I compile it. When I replace the Objective C loop with that code, the compiler also produces vectorized code.

You see something similar in the other kings-of-speed languages. Zig? It's the C code ported directly to a different syntax. D? Exact same. Fortran 90? Slightly different, but still obviously written with compiler vectorization in mind.

(For what it's worth, the trunk version of Clang is able to auto-vectorize either version of the loop without help.)