in reply to Unfortunately benchmarking things with $& isn't easy...
in thread Obtaining server name from UNC path

I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs.
use warnings 'all'; use strict; use Benchmark 'cmpthese'; $::unc = '\\\\server_name\\sys_share'; cmpthese -5 => { blackadder => 'my $tmp = $::unc; $tmp =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//;', theorbtwo => '$::unc =~ m/^\\\\\\\\([^\\\\]+)\\\\/; # Urgle. my $server = $1;', }; __END__ Name "main::unc" used only once: possible typo at bench line 6. Benchmark: running blackadder, theorbtwo for at least 5 CPU seconds... blackadder: 5 wallclock secs ( 5.08 usr + 0.00 sys = 5.08 CPU) @ 76 +238.39/s (n=387291) theorbtwo: 6 wallclock secs ( 5.07 usr + 0.00 sys = 5.07 CPU) @ 97 +024.65/s (n=491915) Rate blackadder theorbtwo blackadder 76238/s -- -21% theorbtwo 97025/s 27% --
The reasonable small difference is what I expect. $& isn't as costly as it used to be, and since the blackadder code only has two regexes, you pay the price twice, compared to one for theorbtwo (due to the parens). The larger difference is that blackadder has to _copy_ the string, while that doesn't happen by theorbtwo.

Abigail

Replies are listed 'Best First'.
Re: Re: Unfortunately benchmarking things with $& isn't easy...
by demerphq (Chancellor) on Aug 08, 2002 at 12:42 UTC
    I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs.

    No you muddy the waters other ways. :-) First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique. Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking.

    Second, your assertion that "$& isn't as costly as it used to be" may be true, but is beside the point. The point is that it is far more efficient to not use it. Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown. That is signifigant no matter how you approach it.

    Yves / DeMerphq
    ---
    Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

      First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique.
      That's stupid. When you are benchmarking, you should minimize the time spend on doing things you are not interested in, otherwise it *does* muddy the waters. We are looking at ratios, not absolute differences, and a / b is usually different from (a + c) / (b + c) for c not equal to 0.

      Calling Perl subroutines is not cheap.

      Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking.
      What do you mean by that? We are comparing ratios. If the ratios of two algorithms would vary wildly from machine to machine, it would be utterly silly to publish any Benchmark. The point of looking at the ratios is to diminish the effect of the speed of the machine.
      Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown.
      Right figure, wrong conclusion. The slowdown isn't mainly caused by the use of $&, but the copying involved. The blackadder program ends up with the server name in a normal variable - one which you can modify and won't be overwritten by the system. Not so by theorbtwo. By slightly modifying the code, assigning $1 to a variable, the difference drops sharply. Conclusion, blackadder is slower because there's more being copied - not because the use of $&. And that's not more than logical. Because $& isn't more costly than using parenthesis - the costs of $& and more so from $` and $' come if you are using other regural expression in your program, for which you don't use $& and friends. But that's not what you are benchmarking.

      Here's my code, bm_blackadder.pl and bm_theorbtwo.pl as yours.

      # bm_theorbtwo_assign.pl # benchmark saw ampersand -- theorbtwo use strict; use warnings; use Benchmark qw(timethis); use Data::Dumper; my $count=$ARGV[0] || -1; my $unc =$ARGV[1] || '\\\\server_name\\sys_share'; print "Matching $unc for $count\n"; print Dumper(timethis($count,sub { $unc =~ m/^\\\\([^\\]+)\\/; my $server = $1; }, 'theorbtwo' )); __END__ # run_bm.pl use Benchmark 'cmpthese'; use Data::Dumper; sub run_bm($){ my $str=shift; my $h; $str=~s/\A(.*)\$VAR1 =/$h=$1;''/se; print $h; my $v=eval($str); die $@ if $@; $v } my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe'; my $hash={ blackadder => run_bm(`perl bm_blackadder.pl $opts`), theorbtwo => run_bm(`perl bm_theorbtwo.pl $opts`), theorbtwo_assign => run_bm(`perl bm_theorbtwo_assign.pl $op +ts`), }; cmpthese($hash); __END__ Matching \\foo\bar\baz.exe for -5 blackadder: 4 wallclock secs ( 5.05 usr + 0.00 sys = 5.05 CPU) @ 71 +109.11/s ( n=359101) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 17 +1489.10/s (n=896888) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.31 usr + 0.00 sys = 5.31 CPU) @ 90 +857.63/s ( n=482454) Rate blackadder theorbtwo_assign the +orbtwo blackadder 71109/s -- -22% + -59% theorbtwo_assign 90858/s 28% -- + -47% theorbtwo 171489/s 141% 89% + --
      Abigail
        That's stupid. ...

        Hmm, perhaps. The way I was looking at it is that its a * k / b * k (which im not arguing is correct, as i dont know, merely explaining :-). Where if the k=1 when using eval and k=1.1 using a subref the ratio stays the same. Maybe this isnt the correct analysis, if so please enlighten me, but please without the "stupid" bit, im well well aware of my own limitations. :-)

        If the ratios of two algorithms would vary wildly from machine to machine...

        I have to admit that I assumed you meant the rate per second. Now that I see what you mean I concede my point is not correct.

        Right figure, wrong conclusion. The slowdown isn't mainly caused by the use of $&, but the copying involved.

        Wow. You are sooo right. If you look closely at my redo of theorbtwos code and the code in bm_theorbtwo.pl the assignment that is present is responsible for the difference. When I made sure that the combined version and the seperate version were _exactly_ the same the results were comparable. Thanks. And good point.

        the costs of $& and more so from $` and $' come if you are using other regural expression in your program, for which you don't use $& and friends. But that's not what you are benchmarking.

        Actually that was what I was trying to get at, if in a somewhat oblique way. :-). Anyway, it looks to me that the presence of $& doesnt in the end have much effect on the validity of the benchmark. Which is cool and interesting. Thanks Abigail-II.

        BTW, i assume

        my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe';
        is because your shell is converting \\ to \?

        Yves / DeMerphq
        ---
        Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)