or download this
Benchmark: timing 10000 iterations of my code, their code...
my code: 1 wallclock secs ( 0.92 usr + 0.06 sys = 0.98 CPU) @ 10
+193.68/s (n=10000)
...
my code: 33 wallclock secs (21.50 usr + 9.36 sys = 30.86 CPU) @ 64
+800.41/s (n=2000000)
their code: 40 wallclock secs (28.05 usr + 8.83 sys = 36.88 CPU) @ 54
+226.99/s (n=2000000)