Unfortunately benchmarking things with $& isn't easy...

update One of the reasons I wrote this reply was to illustrate the issues writing a good benchmark, specifically relating to bencharking things with and without $& in them. But it seems that I made a number of mistakes myself. Make sure you read Abigail-IIs comments below after you read this post, some of what I say turns out to be wrong.

While I agree that benchmarking your solution against blackadders is an interesting thought I have to point out that unfortunately the way you have done it will produce results that are both incorrect (because your benchmark doesnt do what you think it does) and misleading because you arent testing fairly (that dagnatted $& has bitten you in debugger :-).

So lets look at the problems with your benchamrking code:

# THIS doesnt mean \\server_name\sys_share it means \server_name\sys_s
+hare
$unc = '\\server_name\sys_share'; 

# and then you remove the leading "\" and trailing "\sys_share" before
+ the benchmark even starts!
$unc =~ s/^\W*\w+//;
$server = $&;
$server =~ s/^\W+//;
# None of the regexes in the benchmark will match anymore (in a meanin
+gful way)
[download]

So to do the benchmark properly I modified your code

#!perl
use warnings;
use Benchmark;

$unc = '\\\\server_name\\sys_share';

my $re = Benchmark::timethese(-5,
    {
        blackadder => sub {
            $lunc = $unc;
            $lunc =~ s/^\W*\w+//;
            $server = $&;
            $server =~ s/^\W+//;
        },
        theorbtwo => sub {
            $unc =~ m/^\\\\([^\\]+)\\/;
            $server = $1;
            }
    }
);
Benchmark::cmpthese($re);
__END__
Benchmark: running blackadder, theorbtwo, each for at least 5 CPU seco
+nds...
blackadder:  6 wallclock secs ( 5.12 usr +  0.00 sys =  5.12 CPU) @ 13
+1371.90/s (n=673281)
 theorbtwo:  5 wallclock secs ( 5.34 usr +  0.00 sys =  5.34 CPU) @ 19
+0423.65/s (n=1017624)
               Rate blackadder  theorbtwo
blackadder 131372/s         --       -31%
theorbtwo  190424/s        45%         --
[download]

Which shows that your method is faster than blackadders but not too much, only 45%. Luckily for you this number is _still_ totally wrong. The reason is because $& has an interesting effect on regexes _ANYWHERE_ in a program that uses $&, namely it slows them down massively. (japhy has written a number of articles about this, and some approachs to resolve the problem.) So in order to benchmark a solution that use $& against one that doesnt we will need to benchmark them in different perl processes (not forked! totally different), like so:

Program 1: bm_blackadder.pl

# benchmark saw ampersand -- BlackAdder
use strict;
use warnings;
use Benchmark qw(timethis);
use Data::Dumper;

my $count=$ARGV[0] || -1;
my $unc  =$ARGV[1] || '\\\\server_name\\sys_share';
print "Matching $unc for $count\n";
print Dumper(timethis($count,sub {
                  my $lunc = $unc;
                  $lunc =~ s/^\W*\w+//;
                  (my $server = $&)=~ s/^\W+//;
                  $server
                  },'blackadder'
));
[download]

Program 2: bm_theorbtwo.pm

# benchmark saw ampersand -- theorbtwo
use strict;
use warnings;
use Benchmark qw(timethis);
use Data::Dumper;

my $count=$ARGV[0] || -1;
my $unc  =$ARGV[1] || '\\\\server_name\\sys_share';
print "Matching $unc for $count\n";
print Dumper(timethis($count,sub {
                  $unc =~ m/^\\\\([^\\]+)\\/;
                  $1;
                }, 'theorbtwo'
));
[download]

Program 3: run_bm.pl
(Run the others and return the results of both, compared together.)

use Benchmark 'cmpthese';
use Data::Dumper;

sub run_bm($){
    my $str=shift;
    my $h;
    $str=~s/\A(.*)\$VAR1 =/$h=$1;''/se;
    print $h;
    my $v=eval($str);
    die $@ if $@;
    $v
}

my $opts='-5 \\\\foo\\bar\\baz.exe;

my $hash={
          blackadder => run_bm(`perl bm_blackadder.pl $opts`),
          theorbtwo  => run_bm(`perl bm_theorbtwo.pl $opts`),
         };
cmpthese($hash);
__END__
[download]

Which when set up correclty run_bm.pl outputs

Matching \\foo\bar\baz.exe for -5
blackadder:  6 wallclock secs ( 5.22 usr +  0.00 sys =  5.22 CPU) @ 13
+5204.60/s (n=705768)
Matching \\foo\bar\baz.exe for -5
 theorbtwo:  6 wallclock secs ( 5.17 usr +  0.00 sys =  5.17 CPU) @ 36
+7160.48/s (n=1898954)
               Rate blackadder  theorbtwo
blackadder 135205/s         --       -63%
theorbtwo  367160/s       172%         --
[download]

Showing that your solution is about %172 faster than blackadders! Much better than the %50 faster that you might have thought it was! (And also showing the cost that $& has on your code if you are foolish enough to use it, or if someone else has snuck it into their code and you dont know about it)

BTW, You are correct that blackadders solution is not correct, my point in this reply is that benchmarking regexes that use $& is not as simple as one might think (or like). Oh also in the future you should avoid using fixed counts in your benchmark. Almost always it is better to use negative numbers indicating how long to benchmark for. The more seconds the better (I find usually 5-10 seconds is good)

HTH.

Yves / DeMerphq
---
Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

Comment on Unfortunately benchmarking things with $& isn't easy... Select or Download Code

Replies are listed 'Best First'.
Re: Unfortunately benchmarking things with $& isn't easy... by Abigail-II (Bishop) on Aug 08, 2002 at 12:23 UTC
I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs. use warnings 'all'; use strict; use Benchmark 'cmpthese'; $::unc = '\\\\server_name\\sys_share'; cmpthese -5 => { blackadder => 'my $tmp = $::unc; $tmp =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//;', theorbtwo => '$::unc =~ m/^\\\\\\\\([^\\\\]+)\\\\/; # Urgle. my $server = $1;', }; __END__ Name "main::unc" used only once: possible typo at bench line 6. Benchmark: running blackadder, theorbtwo for at least 5 CPU seconds... blackadder: 5 wallclock secs ( 5.08 usr + 0.00 sys = 5.08 CPU) @ 76 +238.39/s (n=387291) theorbtwo: 6 wallclock secs ( 5.07 usr + 0.00 sys = 5.07 CPU) @ 97 +024.65/s (n=491915) Rate blackadder theorbtwo blackadder 76238/s -- -21% theorbtwo 97025/s 27% -- [download] The reasonable small difference is what I expect. `$&` isn't as costly as it used to be, and since the blackadder code only has two regexes, you pay the price twice, compared to one for theorbtwo (due to the parens). The larger difference is that blackadder has to _copy_ the string, while that doesn't happen by theorbtwo. Abigail	[reply] [d/l] [select]
Re: Re: Unfortunately benchmarking things with $& isn't easy... by demerphq (Chancellor) on Aug 08, 2002 at 12:42 UTC
I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs. No you muddy the waters other ways. :-) First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique. Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking. Second, your assertion that "$& isn't as costly as it used to be" may be true, but is beside the point. The point is that it is far more efficient to not use it. Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown. That is signifigant no matter how you approach it. Yves / DeMerphq --- Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)	[reply]
Re: Unfortunately benchmarking things with $& isn't easy... by Abigail-II (Bishop) on Aug 08, 2002 at 13:37 UTC
First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique. That's stupid. When you are benchmarking, you should minimize the time spend on doing things you are not interested in, otherwise it does muddy the waters. We are looking at ratios, not absolute differences, and `a / b` is usually different from `(a + c) / (b + c)` for `c` not equal to 0. Calling Perl subroutines is not cheap. Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking. What do you mean by that? We are comparing ratios. If the ratios of two algorithms would vary wildly from machine to machine, it would be utterly silly to publish any Benchmark. The point of looking at the ratios is to diminish the effect of the speed of the machine. Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown. Right figure, wrong conclusion. The slowdown isn't mainly caused by the use of `$&`, but the copying involved. The blackadder program ends up with the server name in a normal variable - one which you can modify and won't be overwritten by the system. Not so by theorbtwo. By slightly modifying the code, assigning `$1` to a variable, the difference drops sharply. Conclusion, blackadder is slower because there's more being copied - not because the use of `$&`. And that's not more than logical. Because `$&` isn't more costly than using parenthesis - the costs of `$&` and more so from $` and `$'` come if you are using other regural expression in your program, for which you don't use `$&` and friends. But that's not what you are benchmarking. Here's my code, bm_blackadder.pl and bm_theorbtwo.pl as yours. # bm_theorbtwo_assign.pl # benchmark saw ampersand -- theorbtwo use strict; use warnings; use Benchmark qw(timethis); use Data::Dumper; my $count=$ARGV[0] \|\| -1; my $unc =$ARGV[1] \|\| '\\\\server_name\\sys_share'; print "Matching $unc for $count\n"; print Dumper(timethis($count,sub { $unc =~ m/^\\\\([^\\]+)\\/; my $server = $1; }, 'theorbtwo' )); __END__ # run_bm.pl use Benchmark 'cmpthese'; use Data::Dumper; sub run_bm($){ my $str=shift; my $h; $str=~s/\A(.*)\$VAR1 =/$h=$1;''/se; print $h; my $v=eval($str); die $@ if $@; $v } my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe'; my $hash={ blackadder => run_bm(`perl bm_blackadder.pl $opts`), theorbtwo => run_bm(`perl bm_theorbtwo.pl $opts`), theorbtwo_assign => run_bm(`perl bm_theorbtwo_assign.pl $op +ts`), }; cmpthese($hash); __END__ Matching \\foo\bar\baz.exe for -5 blackadder: 4 wallclock secs ( 5.05 usr + 0.00 sys = 5.05 CPU) @ 71 +109.11/s ( n=359101) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 17 +1489.10/s (n=896888) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.31 usr + 0.00 sys = 5.31 CPU) @ 90 +857.63/s ( n=482454) Rate blackadder theorbtwo_assign the +orbtwo blackadder 71109/s -- -22% + -59% theorbtwo_assign 90858/s 28% -- + -47% theorbtwo 171489/s 141% 89% + -- [download] Abigail	[reply] [d/l] [select]
Re: Re: Unfortunately benchmarking things with $& isn't easy... by demerphq (Chancellor) on Aug 08, 2002 at 14:48 UTC
Re: Unfortunately benchmarking things with $& isn't easy... by Abigail-II (Bishop) on Aug 08, 2002 at 16:04 UTC
Re: Unfortunately benchmarking things with $& isn't easy... by theorbtwo (Prior) on Aug 08, 2002 at 12:18 UTC
Sigh. One of these days, I'll learn to proofread and test code before posting to PM, even on things that I shouldn't have to. In this purticular case, I did test the code... in as much as I tested if it ran, but not if it gave the correct results. I never did a mental sanity check on the code from the original post, or printed the output of either method. As to the $& problems, I did know about them, but figured that they were too advanced for the poster, and there's a better way to do it anyway. I knew about the leekage of badness, figured it would effect my benchmarks, and that it wouldn't be a major effect. Seems I underestimated how bad it was. Confession: It does an Immortal Body good.	[reply]