in reply to Benchmarking the block and list forms of grep

Useless Benchmark!

In addition to the context issues already mentioned, your subs don't see @short, @none and @long because they are compiled in a scope where these my variables are not visible. (Put use strict; inside of the q{...} and you'll see.)

Update: The rest of this post should be ignored. To fix the above problem, change my @short to our @short, etc.

Never use q{...} with Benchmark; use sub {...} instead. It's too easy to make a scope mistake when using uncompiled code.

Fixed benchmark code:

use strict; use warnings; use Benchmark qw( cmpthese ); my @none = ('a' .. 'm' ); my @short = ('a', ''); my @long = ('a' .. 'z', ''); my $iter = shift || -3; cmpthese( $iter, { short_block_ne => sub { my @a = grep {$_ ne ''} @short; 1 }, short_block_len => sub { my @a = grep {length} @short; 1 }, short_bare_ne => sub { my @a = grep $_ ne '', @short; 1 }, short_bare_len => sub { my @a = grep length, @short; 1 }, } ); cmpthese( $iter, { none_block_ne => sub { my @a = grep {$_ ne ''} @none; 1 }, none_block_len => sub { my @a = grep {length} @none; 1 }, none_bare_ne => sub { my @a = grep $_ ne '', @none; 1 }, none_bare_len => sub { my @a = grep length, @none; 1 }, } ); cmpthese( $iter,{ long_block_ne => sub { my @a = grep {$_ ne ''} @long; 1 }, long_block_len => sub { my @a = grep {length} @long; 1 }, long_bare_ne => sub { my @a = grep $_ ne '', @long; 1 }, long_bare_len => sub { my @a = grep length, @long; 1 }, } );

Results:

Rate short_block_ne short_bare_ne short_block_len +short_bare_len short_block_ne 231725/s -- -4% -7% + -11% short_bare_ne 240180/s 4% -- -3% + -7% short_block_len 248671/s 7% 4% -- + -4% short_bare_len 259288/s 12% 8% 4% + -- Rate none_block_ne none_bare_ne none_block_len non +e_bare_len none_block_ne 55395/s -- -2% -3% + -4% none_bare_ne 56348/s 2% -- -1% + -2% none_block_len 57047/s 3% 1% -- + -1% none_bare_len 57787/s 4% 3% 1% + -- Rate long_block_ne long_bare_ne long_block_len lon +g_bare_len long_block_ne 28861/s -- -3% -4% + -8% long_bare_ne 29744/s 3% -- -2% + -5% long_block_len 30206/s 5% 2% -- + -3% long_bare_len 31265/s 8% 5% 4% + --

Replies are listed 'Best First'.
Re^2: Benchmarking the block and list forms of grep
by BrowserUk (Patriarch) on Mar 23, 2006 at 01:02 UTC
    Never use q{...} with Benchmark; use sub {...} instead. It's too easy to make a scope mistake when using uncompiled code.

    Hmm. Mistakes are mistakes. Using our for "setup variables" used within the tests but declared external to it, is a simple solution to the scoping problem.

    Whilst using coderefs is an effective ways of capturing external variables through closure, it has other side effects that can influence your benchmarks in much more subtle and harder to detect ways. This is especially true for the often seen 'communal' benchmark where different monks implementations of a subroutine are tested and there are parameters to the subs:

    sub monk1 { my( $arg, @data ) = @_; ## do stuff return @results; } sub monk2 { my( $arg, @data ) = @_; ## do stuff return @results; } .... my @data = (....); cmpthese -3, { monk1 => sub{ my @results = monk1( @data ); }, monk2 => sub{ my @results = monk2( @data ); }, .... };

    The problem here is that you are adding an extra level of subroutine call to each of the implementations under test. If the work being done inside the subroutine is substantial, then the extra overhead will be lost in the noise of the testing and have no substantial effect upon the outcome. But if the code being tested is itself not doing a great deal, then the additional overhead of the extra level of sub call can swamp the differences in the implementation and render the benchmark useless as a result.

    To demonstrate this effect using the grep example, if we grep a list of 10 items using your prefered coderef form of benchmark, we get these results:

    10 items Rate sub_block_short sub_list_short sub_block_short 305975/s -- -4% sub_list_short 319968/s 5% --

    from which you might conclude that the list form of grep is 5% faster than the block form. However, if you perform the same test using the q[ ... ] form of benchmark you get:

    10 items Rate eval_block_short eval_list_short eval_block_short 322569/s -- -1% eval_list_short 325028/s 1% --

    From which you might conclude that the list form is only 1% quicker! Assuming an otherwise quiescent system, this is a rather less dramatic difference, that is probably confined to the realms of 'noise', and definitly not worth bothering with if you consider the block form clearer than the list form.

    And the reason for the difference between the two tests is the overhead of the extra layer of subroutine call in the coderef form of the benchmark. That extra overhead completely obscures the actual differences we are attempting to assertain and explore. In this case, using the block form renders the benchmark useless as the construction of the benchmark itself has influenced and obscured the results we are attempting to define.

    This stands out more when you campare the 4 forms together:

    10 items Rate sub_blk_sh sub_lst_sh eval_blk_sh eval_lst_ +sh sub_blk_sh 305975/s -- -4% -5% - +6% sub_lst_sh 319968/s 5% -- -1% - +2% eval_blk_sh 322569/s 5% 1% -- - +1% eval_lst_sh 325028/s 6% 2% 1% +--

    With the eval_block form coming out as 5% faster to the sub_list form despite that the block form has to be slower due to the construction of the extra level of scope!

    That clearly demonstrates that the construction of the benchmark itself can have a fairly dramatic influence upon the results obtained.

    Of course, as the number of items in the list increases, so the effect of the extra overhead becomes ammortised over an increasing large number of scopes generations and has much less influence on the outcome.

    100_000 items Rate eval_blk_vl sub_blk_vl eval_lst_vl sub_lst_vl eval_blk_vl 36.4/s -- -1% -3% -4% sub_blk_vl 36.6/s 1% -- -2% -3% eval_lst_vl 37.4/s 3%* 2% -- -1% sub_list_vl 37.8/s 4% 3%* 1% --

    Here we see that the differences between the eval_list & block forms, and those between the sub_list & block forms have equalled out at 3%. Less dramatic than the 5% and more significant than 1%.

    However, you will also notice that here the sub forms have overtaken the eval forms. What happened to that extra overhead? If there is an extra overhead attached to the coderef form of benchmark, you would expect that the eval forms would remain faster than their equivalent coderef forms. The differences would be reduced as the overhead is amortised, but still the overhead should be present and show up.

    The conclusion I draw from that is that, at these levels, the cost of constructing the extra scope required by the block form of grep is so insignificant that it simply gets lost in the noise of the benchmark; the internal inconsistancies of the Benchmark module itself; the impact of Perl's memory management; and external factors relating to system usage. As such, I long ago abandoned the list form of grep in favour of the (IMO), syntactically clearer block form as the efficiency justification of using the list form simply doesn't hold up in use.

    Even for the most highly iterated, deeply nested uses of grep, the performance of the construct is much more likely to be influenced by whether I hit a graphic heavy page in my browser; or list the files in my directory; or even just wiggle my mouse about a little; than whether I have used the list or block form.

    If there are any overall conclusions to be drawn, they are:

    1. Benchmarking well is hard.
    2. Drawing the right conclusions from any given benchmark is even harder.
    3. And never, ever say(*) "Never use ..." or "Always use ..." or "You should (not) do ...".

    (*)Except in this case of course :)

    Benchmark code and full results:


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The problem here is that you are adding an extra level of subroutine call to each of the implementations under test.

      I always thought the string was compiled into a sub, but I now see that it isn't.

Re^2: Benchmarking the block and list forms of grep
by diotalevi (Canon) on Mar 23, 2006 at 00:44 UTC

    And thus.. you've benchmarked how much overhead the two operations leave( enter( EXPR ) ) have over just EXPR. Neato.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      What extra call? q{} gets compiled to a sub, so leave(enter()) was already present.

      Furthermore — and my whole point — the subs now close over lexicals @short, @none and @long, whereas the q{} code used @main::short, @main::none and @main::long, all of which were empty. You could use our @short, etc as an alternative, but when I suggested that in the past, it seemed to confuse people.

      Update: oh! I see the code is not compiled into a sub:

      if (ref $c eq 'CODE') { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; &\$c; +} }"; $subref = eval $subcode; } else { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; $c;} } +"; $subref = _doeval($subcode); }

      You'd think that would be mentioned somewhere prominant in the documentation. I can only find a mention in a footnote on another topic.