Benchmarking the block and list forms of grep

The other day here, someone mentioned the idea of using

grep defined, @array

rather than the more usual block form

grep {defined} @array

the reason being that since it avoids creating a block scope the former code will run faster than the latter. Indeed, decompiling the two snippets tends to confirm the hypothesis (all op-codes being equal).

# perl -MO=Concise -e 'grep {defined} @array'
a  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v ->3
7     <|> grepwhile(other->8)[t2] vK/1 ->a
6        <@> grepstart K*/2 ->7
3           <0> pushmark s ->4
-           <1> null lK/1 ->4
-              <1> null sK/1 ->7
-                 <@> scope sK ->7
-                    <0> ex-nextstate v ->8
9                    <1> defined sK/1 ->-
-                       <1> ex-rv2sv sK*/1 ->9
8                          <$> gvsv(*_) s ->9
5           <1> rv2av[t1] lKM/1 ->6
4              <$> gv(*array) s ->5

# perl -MO=Concise -e 'grep defined, @array'
a  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v ->3
7     <|> grepwhile(other->8)[t2] vK/1 ->a
6        <@> grepstart K/2 ->7
3           <0> pushmark s ->4
-           <1> null lK/1 ->4
9              <1> defined sK/1 ->7
-                 <1> ex-rv2sv sK*/1 ->9
8                    <$> gvsv(*_) s ->9
5           <1> rv2av[t1] lKM/1 ->6
4              <$> gv(*array) s ->5
[download]

As it turns out, I've got some code that is too slow, and a recurring theme in that code is the fact that I have lists that may have an empty string in them, and I need to process all the non-empty strings. Hence there are many loops of the form

for my $element (grep {$_ ne ''} @array {...}

So I've been wondering what, if any, impact it would have on performance if I were to change over to the blockless form. So I set up a benchmark with the following items:

grep {length} @array
grep {$_ ne ''} @array
grep length, @array
grep $_ ne '', @array # I was surprised to discover that this works

And I ran those there snippets against a medium-length array with no empty strings and a short and a long array with an empty string. Although I should point out that my usual definition of long is a hundred thousand or a million, but I was interested in a specific domain, where long is never more than twenty or so.

The results surprised me. In fact, the results are so much all over the map that it is difficult to draw any conclusions. The results that interest me the most are those of short and none. As it happens, these two show the greatest differences, alas, they are just about mirror images of each other in terms of performance.

But to a first approximation, the performance of grep {code} @list is not shabby at all, and the remaining differences are lost in the noise.

                     Rate short_block_ne short_bare_ne short_bare_len 
+short_block_len
short_block_ne  3008231/s             --           -0%           -10% 
+           -20%
short_bare_ne   3014515/s             0%            --           -10% 
+           -20%
short_bare_len  3342828/s            11%           11%             -- 
+           -12%
short_block_len 3783871/s            26%           26%            13% 
+             --
                    Rate none_bare_len none_block_len none_bare_ne non
+e_block_ne
none_bare_len  3390307/s            --            -9%         -11%    
+      -13%
none_block_len 3708523/s            9%             --          -3%    
+       -5%
none_bare_ne   3807187/s           12%             3%           --    
+       -2%
none_block_ne  3885959/s           15%             5%           2%    
+        --
                    Rate long_bare_ne long_block_len long_block_ne lon
+g_bare_len
long_bare_ne   3635361/s           --            -6%           -6%    
+       -8%
long_block_len 3869054/s           6%             --           -0%    
+       -2%
long_block_ne  3872708/s           7%             0%            --    
+       -2%
long_bare_len  3963159/s           9%             2%            2%    
+        --
[download]

And the benchmark code:

#! /usr/bin/perl -w

use strict;
use Benchmark 'cmpthese';

my @none  = ('a' .. 'm'    );
my @short = ('a',        '');
my @long  = ('a' .. 'z', '');

my $iter = shift || -1;

print "block_ne : [$_]\n" for grep {$_ ne ''} @short;
print "block_len: [$_]\n" for grep {length}   @short;
print "bare_ne  : [$_]\n" for grep $_ ne '',  @short;
print "bare_len : [$_]\n" for grep length,    @short;

cmpthese(
    $iter, {
        short_block_ne  => q{grep {$_ ne ''} @short},
        short_block_len => q{grep {length}   @short},
        short_bare_ne   => q{grep $_ ne '',  @short},
        short_bare_len  => q{grep length,    @short},
    }
);

cmpthese(
    $iter, {
        none_block_ne  => q{grep {$_ ne ''} @none},
        none_block_len => q{grep {length}   @none},
        none_bare_ne   => q{grep $_ ne '',  @none},
        none_bare_len  => q{grep length,    @none},
    }
);

cmpthese(
    $iter,{
        long_block_ne  => q{grep {$_ ne ''} @long},
        long_block_len => q{grep {length}   @long},
        long_bare_ne   => q{grep $_ ne '',  @long},
        long_bare_len  => q{grep length,    @long},
    }
);
[download]

I think I'll change grep {$_ ne ''} to grep {length} and be done with it.

• another intruder with the mooring in the heart of the Perl

Comment on Benchmarking the block and list forms of grep Select or Download Code

Replies are listed 'Best First'.
Re: Benchmarking the block and list forms of grep by hv (Prior) on Mar 14, 2006 at 11:35 UTC
Beware context. A quick test: `perl -MBenchmark=cmpthese -wle 'cmpthese(1, { context => q{print wan +tarray ? "list" : defined(wantarray) ? "scalar" : "void"}})'` [download] shows that these blocks are called in void context, and I believe that grep has optimisations in place for that. It would be worth assigning the results to an array to make sure you are actually benchmarking the right thing. However I tried that here, and got similar results to you (the maximum difference being 10%), so I think you can discount permutations of the grep as a source of significant savings and concentrate on whichever version reads best to you. In the context of the for loop, however, you may be better off losing the grep altogether - writing it as something like: `for my $element (@array) { length($element) or next; ... }` [download] would avoid the need to copy the list to the stack, which might well be a win for long lists. Hugo	[reply] [d/l] [select]
Re: Benchmarking the block and list forms of grep by brian_d_foy (Abbot) on Mar 14, 2006 at 15:50 UTC
You may want to see Wasting time thinking about wasted time. You really have to think about the numbers that Benchmark gives back to you, and running something four million times a second probably means you aren't actually running anything interesting. The benchmark I presented had the same problem you run into: void context optimization. Update: After playing with this for a bit (I'm writing the Benchmarking Perl talk I'm giving tonight), I think you're benchmark isn't doing anything at all. Your code doesn't see your arrays. When I change the number of elements in the lists, the numbers don't change. When I change the arrays to package variables, things make much more sense. There still isn't a big difference between the methods though, but at least you aren't running a no-op 4 million times a second. :) Another update: I featured this node in my UniForum talk tonight: Benchmarking Perl. -- brian d foy <brian@stonehenge.com> Subscribe to The Perl Review	[reply]
Re: Benchmarking the block and list forms of grep by ikegami (Patriarch) on Mar 22, 2006 at 22:39 UTC
Useless Benchmark! In addition to the context issues already mentioned, your subs don't see @short, @none and @long because they are compiled in a scope where these `my` variables are not visible. (Put `use strict;` inside of the `q{...}` and you'll see.) Update: The rest of this post should be ignored. To fix the above problem, change `my @short` to `our @short`, etc. Read more... (4 kB)	[reply] [d/l] [select]
Re^2: Benchmarking the block and list forms of grep by BrowserUk (Patriarch) on Mar 23, 2006 at 01:02 UTC
Never use q{...} with Benchmark; use sub {...} instead. It's too easy to make a scope mistake when using uncompiled code. Hmm. Mistakes are mistakes. Using our for "setup variables" used within the tests but declared external to it, is a simple solution to the scoping problem. Whilst using coderefs is an effective ways of capturing external variables through closure, it has other side effects that can influence your benchmarks in much more subtle and harder to detect ways. This is especially true for the often seen 'communal' benchmark where different monks implementations of a subroutine are tested and there are parameters to the subs: `sub monk1 { my( $arg, @data ) = @_; ## do stuff return @results; } sub monk2 { my( $arg, @data ) = @_; ## do stuff return @results; } .... my @data = (....); cmpthese -3, { monk1 => sub{ my @results = monk1( @data ); }, monk2 => sub{ my @results = monk2( @data ); }, .... };` [download] The problem here is that you are adding an extra level of subroutine call to each of the implementations under test. If the work being done inside the subroutine is substantial, then the extra overhead will be lost in the noise of the testing and have no substantial effect upon the outcome. But if the code being tested is itself not doing a great deal, then the additional overhead of the extra level of sub call can swamp the differences in the implementation and render the benchmark useless as a result. To demonstrate this effect using the grep example, if we grep a list of 10 items using your prefered coderef form of benchmark, we get these results: `10 items Rate sub_block_short sub_list_short sub_block_short 305975/s -- -4% sub_list_short 319968/s 5% --` [download] from which you might conclude that the list form of grep is 5% faster than the block form. However, if you perform the same test using the `q[ ... ]` form of benchmark you get: `10 items Rate eval_block_short eval_list_short eval_block_short 322569/s -- -1% eval_list_short 325028/s 1% --` [download] From which you might conclude that the list form is only 1% quicker! Assuming an otherwise quiescent system, this is a rather less dramatic difference, that is probably confined to the realms of 'noise', and definitly not worth bothering with if you consider the block form clearer than the list form. And the reason for the difference between the two tests is the overhead of the extra layer of subroutine call in the coderef form of the benchmark. That extra overhead completely obscures the actual differences we are attempting to assertain and explore. In this case, using the block form renders the benchmark useless as the construction of the benchmark itself has influenced and obscured the results we are attempting to define. This stands out more when you campare the 4 forms together: `10 items Rate sub_blk_sh sub_lst_sh eval_blk_sh eval_lst_ +sh sub_blk_sh 305975/s -- -4% -5% - +6% sub_lst_sh 319968/s 5% -- -1% - +2% eval_blk_sh 322569/s 5% 1% -- - +1% eval_lst_sh 325028/s 6% 2% 1% +--` [download] With the eval_block form coming out as 5% faster to the sub_list form despite that the block form has to be slower due to the construction of the extra level of scope! That clearly demonstrates that the construction of the benchmark itself can have a fairly dramatic influence upon the results obtained. Of course, as the number of items in the list increases, so the effect of the extra overhead becomes ammortised over an increasing large number of scopes generations and has much less influence on the outcome. `100_000 items Rate eval_blk_vl sub_blk_vl eval_lst_vl sub_lst_vl eval_blk_vl 36.4/s -- -1% -3% -4% sub_blk_vl 36.6/s 1% -- -2% -3% eval_lst_vl 37.4/s 3%* 2% -- -1% sub_list_vl 37.8/s 4% 3%* 1% --` [download] Here we see that the differences between the eval_list & block forms, and those between the sub_list & block forms have equalled out at 3%. Less dramatic than the 5% and more significant than 1%. However, you will also notice that here the sub forms have overtaken the eval forms. What happened to that extra overhead? If there is an extra overhead attached to the coderef form of benchmark, you would expect that the eval forms would remain faster than their equivalent coderef forms. The differences would be reduced as the overhead is amortised, but still the overhead should be present and show up. The conclusion I draw from that is that, at these levels, the cost of constructing the extra scope required by the block form of grep is so insignificant that it simply gets lost in the noise of the benchmark; the internal inconsistancies of the Benchmark module itself; the impact of Perl's memory management; and external factors relating to system usage. As such, I long ago abandoned the list form of grep in favour of the (IMO), syntactically clearer block form as the efficiency justification of using the list form simply doesn't hold up in use. Even for the most highly iterated, deeply nested uses of grep, the performance of the construct is much more likely to be influenced by whether I hit a graphic heavy page in my browser; or list the files in my directory; or even just wiggle my mouse about a little; than whether I have used the list or block form. If there are any overall conclusions to be drawn, they are: Benchmarking well is hard. Drawing the right conclusions from any given benchmark is even harder. And never, ever say() "Never use ..." or "Always use ..." or "You should (not) do ...". ()Except in this case of course :) Benchmark code and full results: Read more... (6 kB) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^3: Benchmarking the block and list forms of grep by ikegami (Patriarch) on Mar 23, 2006 at 01:18 UTC
The problem here is that you are adding an extra level of subroutine call to each of the implementations under test. I always thought the string was compiled into a sub, but I now see that it isn't.	[reply]
Re^2: Benchmarking the block and list forms of grep by diotalevi (Canon) on Mar 23, 2006 at 00:44 UTC
And thus.. you've benchmarked how much overhead the two operations leave( enter( EXPR ) ) have over just EXPR. Neato. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re^3: Benchmarking the block and list forms of grep by ikegami (Patriarch) on Mar 23, 2006 at 01:01 UTC
~~What extra call? `q{}` gets compiled to a sub, so leave(enter()) was already present.~~ Furthermore — and my whole point — the subs now close over lexicals @short, @none and @long, whereas the `q{}` code used @main::short, @main::none and @main::long, all of which were empty. You could use `our @short`, etc as an alternative, but when I suggested that in the past, it seemed to confuse people. Update: oh! I see the code is not compiled into a sub: `if (ref $c eq 'CODE') { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; &\$c; +} }"; $subref = eval $subcode; } else { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; $c;} } +"; $subref = _doeval($subcode); }` [download] You'd think that would be mentioned somewhere prominant in the documentation. I can only find a mention in a footnote on another topic.	[reply] [d/l] [select]