Benchmarking instability

tlm has asked for the wisdom of the Perl Monks concerning the following question:

I found some puzzling instabilities while benchmarking a couple of alternatives that came up in a recent thread.

First the code (the only difference between before.pl and after.pl is that the definitions of $X and $Y are reversed from one to the other):

# before.pl
use strict;
use warnings;

use Benchmark 'cmpthese';

my $sz      = ( shift || 10 ) - 4;
my $X = 'N' . 'x' x $sz . '000';
my $Y = 'N' . 'x' x $sz . '001';

print '# input length: ', length( $X ), $/;

cmpthese( -1,
          {
           '?!X'  => '$X =~ /^N(?!.*00$).*$/',
           '?!Y'  => '$Y =~ /^N(?!.*00$).*$/',
           '?<!X' => '$X =~ /^N.*(?<!00$)$/',
           '?<!Y' => '$Y =~ /^N.*(?<!00$)$/',
           }
        );
__END__


# after.pl
use strict;
use warnings;

use Benchmark 'cmpthese';

my $sz      = ( shift || 10 ) - 4;
my $X = 'N' . 'x' x $sz . '001';
my $Y = 'N' . 'x' x $sz . '000';

print '# input length: ', length( $X ), $/;

cmpthese( -1,
          {
           '?!X'  => '$X =~ /^N(?!.*00$).*$/',
           '?!Y'  => '$Y =~ /^N(?!.*00$).*$/',
           '?<!X' => '$X =~ /^N.*(?<!00$)$/',
           '?<!Y' => '$Y =~ /^N.*(?<!00$)$/',
           }
        );
__END__
[download]

And here's some sample output (I have seen many variations; this example only illustrates the general instability):

% perl before.pl 7
# input length: 7
          Rate ?<!Y  ?!Y ?<!X  ?!X
?<!Y 1553555/s   -- -70% -75% -77%
?!Y  5222982/s 236%   -- -17% -22%
?<!X 6285324/s 305%  20%   --  -6%
?!X  6718816/s 332%  29%   7%   --
% perl after.pl
# input length: 7
          Rate ?<!X  ?!Y ?<!Y  ?!X
?<!X 2414483/s   --  -1%  -4% -19%
?!Y  2434774/s   1%   --  -3% -18%
?<!Y 2506841/s   4%   3%   -- -16%
?!X  2970871/s  23%  22%  19%   --
[download]

Note that the fastest alternative in before.pl is more than twice as fast as the fastest one in after.pl. Likewise the ratio between the fastest and the slowest in before.pl is much greater than the same ratio in after.pl.

What's going on?

the lowliest monk

Comment on Benchmarking instability Select or Download Code

Replies are listed 'Best First'.
Re: Benchmarking instability by ikegami (Patriarch) on Jun 06, 2005 at 16:54 UTC
Your test is bad. Your code strings are being `eval`uated inside of Benchmark, oustside of $X and $Y's scope. Right now, you're using package variables $main::X and $main::Y, which are undefined. Proof: use strict; use warnings; use Benchmark 'cmpthese'; my $sz = ( shift \|\| 10 ) - 4; my $X = 'N' . 'x' x $sz . '000'; my $Y = 'N' . 'x' x $sz . '001'; cmpthese( 1, { '?!X' => 'use strict; $X =~ /^N(?!.00$).$/', '?!Y' => 'use strict; $Y =~ /^N(?!.00$).$/', '?<!X' => 'use strict; $X =~ /^N.(?<!00$)$/', '?<!Y' => 'use strict; $Y =~ /^N.(?<!00$)$/', } ); __END__ Benchmark: timing 1 iterations of ?!X, ?!Y, ?<!X, ?<!Y... runloop unable to compile 'use strict; $X =~ /^N(?!.00$).$/': Global + symbol "$X" requires explicit package name at (eval 2) line 1. code: sub { for (1 .. 1) { local $_; package main; use strict; $X =~ / +^N(?!.00$).$/;} } [download] Fix: `use strict; use warnings; use Benchmark 'cmpthese'; my $sz = ( shift \|\| 10 ) - 4; my $X = 'N' . 'x' x $sz . '000'; my $Y = 'N' . 'x' x $sz . '001'; print '# input length: ', length( $X ), $/; cmpthese( -1, { '?!X' => sub { scalar $X =~ /^N(?!.00$).$/ }, '?!Y' => sub { scalar $Y =~ /^N(?!.00$).$/ }, '?<!X' => sub { scalar $X =~ /^N.(?<!00$)$/ }, '?<!Y' => sub { scalar $Y =~ /^N.(?<!00$)$/ }, } );` [download] The `scalar` shouldn't make a difference as long as you don't have captures. The real difference is the sub {} instead of ''. The subs now capture over $X and $Y. Using `our $X` and `our $Y` instead of `my $X` and `my $Y` would also have done the trick. With the fixed code, it gives me the following, even when reversed: `X = Nxxxxxx000 Y = Nxxxxxx001 Rate ?<!X ?!Y ?!X ?<!Y ?<!X 327712/s -- -39% -56% -59% ?!Y 535499/s 63% -- -29% -34% ?!X 750772/s 129% 40% -- -7% ?<!Y 805492/s 146% 50% 7% --` [download]	[reply] [d/l] [select]
Re^2: Benchmarking instability by tlm (Prior) on Jun 06, 2005 at 20:24 UTC
Thanks++, that explains it. After seeing your post, I vaguely recall having seen a "`Benchmark` gotcha" somewhere that had a similar explanation, but I can't place it. I normally use subs when I do benchamrks, but this time I figured that the operations being benchmarked were so fast and the differences between them potentially so slight, that the overhead of calling the subs would significantly distort the results. Therefore, I repeated the tests, still using `eval`'ed strings, but replacing the relevant lexicals with package variables. Here are the results: `# input length: 7 Rate ?<!Y ?!X ?<!X ?!Y ?<!Y 613304/s -- -32% -50% -52% ?!X 908420/s 48% -- -27% -29% ?<!X 1236528/s 102% 36% -- -4% ?!Y 1286220/s 110% 42% 4% --` [download] The number of executions per second indeed goes up significantly (as I expected), but the ratio between the fastest and the slowest goes down, which make no sense to me. So the puzzle is seriously wounded, but not dead yet ;-) . Read more... (829 Bytes) the lowliest monk	[reply] [d/l] [select]
Re: Benchmarking instability by BrowserUk (Patriarch) on Jun 06, 2005 at 17:25 UTC
Benchmark.PM is terminally broken. Even the ordering of the cases, which is determined by the lexigraphical ordering of the keys, can completely change the outcome (super search for "Benchmark" and "bias"). The cases should be run interleaved rather all of case A, then all of case B etc. If the code being benchmarked is destructive, then you have no option but to include test setup code within the benchmark, often as not totally obscuring the real differences in the code under test. The callback nature of the benchmarking hampers verification that each of the tests is producing the same results. It's very difficult to isolate the cumulative affect of memory consumption across the tests. Of course, many will tell you that you shouldn't be bothering to benchmark anyway, and these insufficiencies are only important if you take the results of your benchmarks seriously. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]