in reply to Benchmarking instability

Your test is bad. Your code strings are being evaluated inside of Benchmark, oustside of $X and $Y's scope. Right now, you're using package variables $main::X and $main::Y, which are undefined.

Proof:

use strict; use warnings; use Benchmark 'cmpthese'; my $sz = ( shift || 10 ) - 4; my $X = 'N' . 'x' x $sz . '000'; my $Y = 'N' . 'x' x $sz . '001'; cmpthese( 1, { '?!X' => 'use strict; $X =~ /^N(?!.*00$).*$/', '?!Y' => 'use strict; $Y =~ /^N(?!.*00$).*$/', '?<!X' => 'use strict; $X =~ /^N.*(?<!00$)$/', '?<!Y' => 'use strict; $Y =~ /^N.*(?<!00$)$/', } ); __END__ Benchmark: timing 1 iterations of ?!X, ?!Y, ?<!X, ?<!Y... runloop unable to compile 'use strict; $X =~ /^N(?!.*00$).*$/': Global + symbol "$X" requires explicit package name at (eval 2) line 1. code: sub { for (1 .. 1) { local $_; package main; use strict; $X =~ / +^N(?!.*00$).*$/;} }

Fix:

use strict; use warnings; use Benchmark 'cmpthese'; my $sz = ( shift || 10 ) - 4; my $X = 'N' . 'x' x $sz . '000'; my $Y = 'N' . 'x' x $sz . '001'; print '# input length: ', length( $X ), $/; cmpthese( -1, { '?!X' => sub { scalar $X =~ /^N(?!.*00$).*$/ }, '?!Y' => sub { scalar $Y =~ /^N(?!.*00$).*$/ }, '?<!X' => sub { scalar $X =~ /^N.*(?<!00$)$/ }, '?<!Y' => sub { scalar $Y =~ /^N.*(?<!00$)$/ }, } );

The scalar shouldn't make a difference as long as you don't have captures. The real difference is the sub {} instead of ''. The subs now capture over $X and $Y. Using our $X and our $Y instead of my $X and my $Y would also have done the trick.

With the fixed code, it gives me the following, even when reversed:

X = Nxxxxxx000 Y = Nxxxxxx001 Rate ?<!X ?!Y ?!X ?<!Y ?<!X 327712/s -- -39% -56% -59% ?!Y 535499/s 63% -- -29% -34% ?!X 750772/s 129% 40% -- -7% ?<!Y 805492/s 146% 50% 7% --

Replies are listed 'Best First'.
Re^2: Benchmarking instability
by tlm (Prior) on Jun 06, 2005 at 20:24 UTC

    Thanks++, that explains it. After seeing your post, I vaguely recall having seen a "Benchmark gotcha" somewhere that had a similar explanation, but I can't place it.

    I normally use subs when I do benchamrks, but this time I figured that the operations being benchmarked were so fast and the differences between them potentially so slight, that the overhead of calling the subs would significantly distort the results. Therefore, I repeated the tests, still using eval'ed strings, but replacing the relevant lexicals with package variables. Here are the results:

    # input length: 7 Rate ?<!Y ?!X ?<!X ?!Y ?<!Y 613304/s -- -32% -50% -52% ?!X 908420/s 48% -- -27% -29% ?<!X 1236528/s 102% 36% -- -4% ?!Y 1286220/s 110% 42% 4% --
    The number of executions per second indeed goes up significantly (as I expected), but the ratio between the fastest and the slowest goes down, which make no sense to me. So the puzzle is seriously wounded, but not dead yet ;-) .

    the lowliest monk