in reply to Benchmarking instability

Benchmark.PM is terminally broken. Even the ordering of the cases, which is determined by the lexigraphical ordering of the keys, can completely change the outcome (super search for "Benchmark" and "bias"). The cases should be run interleaved rather all of case A, then all of case B etc.

If the code being benchmarked is destructive, then you have no option but to include test setup code within the benchmark, often as not totally obscuring the real differences in the code under test.

The callback nature of the benchmarking hampers verification that each of the tests is producing the same results.

It's very difficult to isolate the cumulative affect of memory consumption across the tests.

Of course, many will tell you that you shouldn't be bothering to benchmark anyway, and these insufficiencies are only important if you take the results of your benchmarks seriously.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.