in reply to Benchmarking instability
Benchmark.PM is terminally broken. Even the ordering of the cases, which is determined by the lexigraphical ordering of the keys, can completely change the outcome (super search for "Benchmark" and "bias"). The cases should be run interleaved rather all of case A, then all of case B etc.
If the code being benchmarked is destructive, then you have no option but to include test setup code within the benchmark, often as not totally obscuring the real differences in the code under test.
The callback nature of the benchmarking hampers verification that each of the tests is producing the same results.
It's very difficult to isolate the cumulative affect of memory consumption across the tests.
Of course, many will tell you that you shouldn't be bothering to benchmark anyway, and these insufficiencies are only important if you take the results of your benchmarks seriously.
|
|---|