in reply to Re: Unfortunately benchmarking things with $& isn't easy...
in thread Obtaining server name from UNC path

I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs.

No you muddy the waters other ways. :-) First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique. Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking.

Second, your assertion that "$& isn't as costly as it used to be" may be true, but is beside the point. The point is that it is far more efficient to not use it. Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown. That is signifigant no matter how you approach it.

Yves / DeMerphq
---
Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

  • Comment on Re: Re: Unfortunately benchmarking things with $& isn't easy...

Replies are listed 'Best First'.
Re: Unfortunately benchmarking things with $& isn't easy...
by Abigail-II (Bishop) on Aug 08, 2002 at 13:37 UTC
    First off using subs or strings makes no difference to validity of the benchmark so long as everything benchmarked uses the same technique.
    That's stupid. When you are benchmarking, you should minimize the time spend on doing things you are not interested in, otherwise it *does* muddy the waters. We are looking at ratios, not absolute differences, and a / b is usually different from (a + c) / (b + c) for c not equal to 0.

    Calling Perl subroutines is not cheap.

    Any difference that might result from that would be more than drowned out by noise and by the speed difference of the machines we are using to do the benchmarking.
    What do you mean by that? We are comparing ratios. If the ratios of two algorithms would vary wildly from machine to machine, it would be utterly silly to publish any Benchmark. The point of looking at the ratios is to diminish the effect of the speed of the machine.
    Why dont you run the code I posted and show us just how much it does cost on your system? On my system it would appear that it costs a 125% slowdown.
    Right figure, wrong conclusion. The slowdown isn't mainly caused by the use of $&, but the copying involved. The blackadder program ends up with the server name in a normal variable - one which you can modify and won't be overwritten by the system. Not so by theorbtwo. By slightly modifying the code, assigning $1 to a variable, the difference drops sharply. Conclusion, blackadder is slower because there's more being copied - not because the use of $&. And that's not more than logical. Because $& isn't more costly than using parenthesis - the costs of $& and more so from $` and $' come if you are using other regural expression in your program, for which you don't use $& and friends. But that's not what you are benchmarking.

    Here's my code, bm_blackadder.pl and bm_theorbtwo.pl as yours.

    # bm_theorbtwo_assign.pl # benchmark saw ampersand -- theorbtwo use strict; use warnings; use Benchmark qw(timethis); use Data::Dumper; my $count=$ARGV[0] || -1; my $unc =$ARGV[1] || '\\\\server_name\\sys_share'; print "Matching $unc for $count\n"; print Dumper(timethis($count,sub { $unc =~ m/^\\\\([^\\]+)\\/; my $server = $1; }, 'theorbtwo' )); __END__ # run_bm.pl use Benchmark 'cmpthese'; use Data::Dumper; sub run_bm($){ my $str=shift; my $h; $str=~s/\A(.*)\$VAR1 =/$h=$1;''/se; print $h; my $v=eval($str); die $@ if $@; $v } my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe'; my $hash={ blackadder => run_bm(`perl bm_blackadder.pl $opts`), theorbtwo => run_bm(`perl bm_theorbtwo.pl $opts`), theorbtwo_assign => run_bm(`perl bm_theorbtwo_assign.pl $op +ts`), }; cmpthese($hash); __END__ Matching \\foo\bar\baz.exe for -5 blackadder: 4 wallclock secs ( 5.05 usr + 0.00 sys = 5.05 CPU) @ 71 +109.11/s ( n=359101) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.23 usr + 0.00 sys = 5.23 CPU) @ 17 +1489.10/s (n=896888) Matching \\foo\bar\baz.exe for -5 theorbtwo: 5 wallclock secs ( 5.31 usr + 0.00 sys = 5.31 CPU) @ 90 +857.63/s ( n=482454) Rate blackadder theorbtwo_assign the +orbtwo blackadder 71109/s -- -22% + -59% theorbtwo_assign 90858/s 28% -- + -47% theorbtwo 171489/s 141% 89% + --
    Abigail
      That's stupid. ...

      Hmm, perhaps. The way I was looking at it is that its a * k / b * k (which im not arguing is correct, as i dont know, merely explaining :-). Where if the k=1 when using eval and k=1.1 using a subref the ratio stays the same. Maybe this isnt the correct analysis, if so please enlighten me, but please without the "stupid" bit, im well well aware of my own limitations. :-)

      If the ratios of two algorithms would vary wildly from machine to machine...

      I have to admit that I assumed you meant the rate per second. Now that I see what you mean I concede my point is not correct.

      Right figure, wrong conclusion. The slowdown isn't mainly caused by the use of $&, but the copying involved.

      Wow. You are sooo right. If you look closely at my redo of theorbtwos code and the code in bm_theorbtwo.pl the assignment that is present is responsible for the difference. When I made sure that the combined version and the seperate version were _exactly_ the same the results were comparable. Thanks. And good point.

      the costs of $& and more so from $` and $' come if you are using other regural expression in your program, for which you don't use $& and friends. But that's not what you are benchmarking.

      Actually that was what I was trying to get at, if in a somewhat oblique way. :-). Anyway, it looks to me that the presence of $& doesnt in the end have much effect on the validity of the benchmark. Which is cool and interesting. Thanks Abigail-II.

      BTW, i assume

      my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe';
      is because your shell is converting \\ to \?

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

        Hmm, perhaps. The way I was looking at it is that its a * k / b * k (which im not arguing is correct, as i dont know, merely explaining :-). Where if the k=1 when using eval and k=1.1 using a subref the ratio stays the same.
        But the cost of calling a sub is independent of the size, or running time, of the body of the sub. Hence, you get (fixed) additional time for each of the clauses that contain a sub ref. And you pay that price *for each invocation of the sub*. When using a string, the code is evalled only once.

        Here's a benchmark of a fairly trivial action, once done with a sub, and once done with an eval:

        use strict; use warnings 'all'; use Benchmark 'cmpthese'; $::a = "foo-bar"; cmpthese -5 => { sub => sub {$::a =~ /([-])/}, str => '$::a =~ /([-])/', }; __END__ Benchmark: running str, sub for at least 5 CPU seconds... str: 5 wallclock secs ( 5.34 usr + 0.00 sys = 5.34 CPU) @ 39 +0154.68/s (n=2083426) sub: 5 wallclock secs ( 5.13 usr + 0.00 sys = 5.13 CPU) @ 28 +2139.18/s (n=1447374) Rate sub str sub 282139/s -- -28% str 390155/s 38% --
        Here's the relevant code from Benchmark::runloop:
        my ($subcode, $subref); if (ref $c eq 'CODE') { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; &\ +$c; } }"; $subref = eval $subcode; } else { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; $c +;} }"; $subref = _doeval($subcode); }
        BTW, i assume
        my $opts='-5 \\\\\\\\foo\\\\bar\\\\baz.exe';
        is because your shell is converting \\ to \?
        Yes. I could have used
        my $opts=q!-5 '\\\\foo\\bar\\baz.exe'!;
        too.

        Abigail