Battle Royal Grep vs For

Herkum has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Battle Royal Grep vs For by kyle (Abbot) on Mar 19, 2008 at 18:07 UTC
I think your data set is way too small (four elements), and you don't account for differences you might get from matching a different number of times. When benchmarking a list operation, try it with a long list! If you're testing some kind of selection method, see how it works against different proportions of matches. Which is faster when there's nothing to find? Which is faster when everything matches? I tried these out, and I was a little surprised at the results. Basically grep wins when nothing matches, but `for` wins when everything matches. Mostly, though, I'm not sure this matters. A difference as small as that is likely just noise. After all, when I ran this a second time, `for` came out ahead in the non-matching scenario. use List::Util qw( shuffle ); my @match_0 = gen_values( 0 ); my @match_25 = gen_values( 1/4 ); my @match_50 = gen_values( 1/2 ); my @match_75 = gen_values( 3/4 ); my @match_100 = gen_values( 1 ); sub gen_values { my $proportion = shift; my $count = 10_000; my $matching = $count * $proportion; my @out; push @out, gen_matching() while $matching-- > 0; push @out, gen_non_matching() while scalar @out < $count; return shuffle @out; } sub gen_matching { my $out = gen_non_matching(); substr( $out, rand length $out, 1 ) = 'x'; return $out; } sub gen_non_matching { my $out = ''; $out .= int rand 10 while length $out < 100; return $out; } use Benchmark qw( cmpthese ); cmpthese -2, { for_0 => sub { for_loop( \@match_0 ) }, for_25 => sub { for_loop( \@match_25 ) }, for_50 => sub { for_loop( \@match_50 ) }, for_75 => sub { for_loop( \@match_75 ) }, for_100 => sub { for_loop( \@match_100 ) }, grep_0 => sub { grep_loop( \@match_0 ) }, grep_25 => sub { grep_loop( \@match_25 ) }, grep_50 => sub { grep_loop( \@match_50 ) }, grep_75 => sub { grep_loop( \@match_75 ) }, grep_100 => sub { grep_loop( \@match_100 ) }, }; sub for_loop { my @values = @{shift()}; my @matches; for ( @values ) { push @matches, $_ if /x/; } return @matches; } sub grep_loop { my @values = @{shift()}; my @matches = grep /x/, @values; return @matches; } __END__ Rate grep_100 grep_75 for_100 grep_50 for_75 for_50 grep_2 +5 for_25 for_0 grep_0 grep_100 81.9/s -- -11% -18% -20% -25% -30% -31 +% -36% -42% -43% grep_75 92.0/s 12% -- -8% -10% -15% -21% -22 +% -28% -34% -36% for_100 100.0/s 22% 9% -- -2% -8% -14% -15 +% -22% -29% -30% grep_50 102/s 25% 11% 2% -- -6% -12% -13 +% -20% -27% -29% for_75 108/s 32% 18% 9% 6% -- -7% -8 +% -15% -23% -25% for_50 117/s 42% 27% 17% 14% 7% -- -1 +% -9% -17% -19% grep_25 118/s 44% 28% 18% 15% 9% 1% - +- -7% -16% -18% for_25 128/s 56% 39% 28% 25% 18% 9% 8 +% -- -9% -11% for_0 140/s 71% 52% 40% 37% 29% 20% 19 +% 10% -- -3% grep_0 144/s 76% 56% 44% 41% 33% 23% 22 +% 13% 3% -- [download]	[reply] [d/l] [select]
Re^2: Battle Royal Grep vs For by driver8 (Scribe) on Mar 19, 2008 at 19:44 UTC
Just thought I'd muddy the waters a bit more, for fun: cmpthese -2, { for_0 => sub { for_loop( \@match_0 ) }, for_25 => sub { for_loop( \@match_25 ) }, for_50 => sub { for_loop( \@match_50 ) }, for_75 => sub { for_loop( \@match_75 ) }, for_100 => sub { for_loop( \@match_100 ) }, grep_0 => sub { grep_loop( \@match_0 ) }, grep_25 => sub { grep_loop( \@match_25 ) }, grep_50 => sub { grep_loop( \@match_50 ) }, grep_75 => sub { grep_loop( \@match_75 ) }, grep_100 => sub { grep_loop( \@match_100 ) }, fastgrep_0 => sub { fastgrep_loop( \@match_0 ) }, fastgrep_25 => sub { fastgrep_loop( \@match_25 ) }, fastgrep_50 => sub { fastgrep_loop( \@match_50 ) }, fastgrep_75 => sub { fastgrep_loop( \@match_75 ) }, fastgrep_100 => sub { fastgrep_loop( \@match_100 ) }, noretfor_0 => sub { noretfor_loop( \@match_0 ) }, noretfor_25 => sub { noretfor_loop( \@match_25 ) }, noretfor_50 => sub { noretfor_loop( \@match_50 ) }, noretfor_75 => sub { noretfor_loop( \@match_75 ) }, noretfor_100 => sub { noretfor_loop( \@match_100 ) }, noretgrep_0 => sub { noretgrep_loop( \@match_0 ) }, noretgrep_25 => sub { noretgrep_loop( \@match_25 ) }, noretgrep_50 => sub { noretgrep_loop( \@match_50 ) }, noretgrep_75 => sub { noretgrep_loop( \@match_75 ) }, noretgrep_100 => sub { noretgrep_loop( \@match_100 ) }, }; sub for_loop { my @matches; for ( @{$_[0]} ) { push @matches, $_ if /x/; } return @matches; } sub grep_loop { my @matches = grep /x/, @{$_[0]}; return @matches; } sub fastgrep_loop { return grep /x/, @{$_[0]}; } sub noretfor_loop { for (@{$_[0]}) { $_ =~ /x/ } } sub noretgrep_loop { grep /x/, @{$_[0]}; } __END__ (cropped to fit) Rate grep_100 grep_75 for_100 grep_50 for_75 for_50 gre +p_25 for_25 noretfor_75 grep_100 134/s -- -17% -29% -41% -42% -55% +-62% -68% -79% grep_75 161/s 20% -- -15% -29% -30% -47% +-55% -61% -75% for_100 189/s 41% 17% -- -16% -18% -37% +-47% -55% -71% grep_50 226/s 68% 40% 20% -- -2% -25% +-36% -46% -65% for_75 231/s 72% 43% 22% 2% -- -24% +-35% -45% -64% for_50 302/s 125% 87% 60% 33% 31% -- +-15% -28% -54% grep_25 356/s 165% 121% 88% 57% 54% 18% + -- -15% -45% for_25 419/s 212% 160% 121% 85% 82% 39% + 18% -- -36% noretfor_75 650/s 383% 303% 243% 187% 182% 115% + 82% 55% -- noretfor_100 680/s 406% 321% 259% 200% 195% 125% + 91% 62% 5% noretfor_50 693/s 416% 330% 266% 206% 200% 130% + 95% 66% 7% for_0 693/s 416% 330% 266% 206% 200% 130% + 95% 66% 7% noretfor_0 697/s 419% 332% 269% 208% 202% 131% + 96% 66% 7% noretgrep_50 700/s 421% 334% 270% 209% 204% 132% + 96% 67% 8% noretfor_25 701/s 421% 334% 271% 210% 204% 132% + 97% 67% 8% noretgrep_75 736/s 448% 356% 289% 225% 219% 144% +106% 76% 13% fastgrep_0 740/s 451% 359% 291% 227% 221% 145% +108% 77% 14% fastgrep_75 747/s 456% 363% 295% 230% 224% 147% +110% 78% 15% fastgrep_100 752/s 460% 366% 298% 232% 226% 149% +111% 80% 16% noretgrep_25 769/s 472% 376% 306% 240% 233% 154% +116% 84% 18% noretgrep_0 769/s 472% 377% 307% 240% 233% 155% +116% 84% 18% fastgrep_50 777/s 478% 381% 311% 243% 237% 157% +118% 85% 20% noretgrep_100 781/s 481% 384% 313% 245% 238% 159% +119% 86% 20% grep_0 796/s 492% 393% 321% 252% 245% 164% +123% 90% 23% fastgrep_25 801/s 496% 396% 323% 254% 247% 165% +125% 91% 23% [download] I skipped "`my @values = @{shift()};`" in all of them, since this array copy seemed to be a big part of the run time. The point I would take from this is that the performance really depends more on the situation you use them in and how you use them. As kyle says, it's probably best to just use the one that is clearest. For me, that would be fastgrep_loop. -driver8	[reply] [d/l] [select]
Re^3: Battle Royal Grep vs For by kyle (Abbot) on Mar 19, 2008 at 20:37 UTC
You're right, copying does take time, but I'm not sure that's really what's going on here. I tested some more myself. Read more... (3 kB) What this is meant to show is really that grep performs differently depending on context. Benchmark::cmpthese calls the subs you give it in void context, so if you want another context for your code, you need to provide it. This is part of why the subs I wrote are careful to actually collect (and return) results. That's what these would do in the real world. In void context, `grep` doesn't bother to make a list, so it's a lot faster. It's also faster when I forced a scalar context because it again doesn't make a list. In that case, it just counts the matches and passes out the number. You can see that the void and scalar cases are far ahead of all the others. The `grep` in list context without a copy is only slightly faster than the one that makes a copy. Basically it's in the ballpark of copying while all the others (in void and scalar contexts) are in another ballpark.	[reply] [d/l] [select]
Re^2: Battle Royal Grep vs For by Herkum (Parson) on Mar 19, 2008 at 18:21 UTC
I thought about a larger data set, but it did not occur to me to change the number of potential matches. Your tests certainly muddy up the waters a bit, because now I would have to think about the number of expected matches to determine which one to use in certain situations.	[reply]
Re^3: Battle Royal Grep vs For by kyle (Abbot) on Mar 19, 2008 at 18:35 UTC
Actually, my personal opinion is that you should always use the one you find the most comprehensible. In this case, I find grep the easier to understand because it's being used as a filter. In a more complicated real world example, a loop might make more sense, especially if the filtering requires a lot more than a regular expression. It's not time to worry about which one is faster until the one you're using is too slow. Profile your code first and then look for speed improvements in the places that matter. Figuring out the fastest way to do something for its own sake has entertainment value, but it shouldn't be how you decide to write something. Write for those who will read it (and by this, I do not mean the interpreter). Besides all that, in this case, I think testing shows there isn't a significant speed difference. The biggest difference between identical data sets was in the 100 case, and even there the difference was only 22%.	[reply]
Re: Battle Royal Grep vs For by runrig (Abbot) on Mar 19, 2008 at 17:32 UTC
Your benchmark may be relatively fair, but your use of qr is bad. qr is used to compile a regex, so you are compiling a regex on every iteration of your loops. You could assign something like `my $re = qr{test};` before the benchmark and then use `/$re/` in the benchmark, but I think it would be better to just use `/test/` inside the benchmark code, that way you're not measuring things that you don't mean to measure.	[reply] [d/l] [select]
Re^2: Battle Royal Grep vs For by Herkum (Parson) on Mar 19, 2008 at 17:47 UTC
Your point is taken, so I reran the tests and got this, `Rate grep_loop for_loop grep_loop 150375/s -- -22% for_loop 193618/s 29% --` [download] So now they are both faster, but grep is still slower. Than the for loop	[reply] [d/l]
Re^3: Battle Royal Grep vs For by Fletch (Bishop) on Mar 19, 2008 at 17:54 UTC
And as usual `grep EXPR, LIST` that everyone forgets about wins the race. `#!/usr/bin/perl use strict; use warnings; my @values = qw(test test1 test2 test3); use Benchmark qw( cmpthese ); cmpthese -2, { for_loop => sub { my @matches; for (@values) { push @matches, $_ +if $_ =~ /test/ }; }, grep_loop => sub { my @matches = grep { $_ =~ /test/ } @values; + }, grep_expr => sub { my @matches = grep $_ =~ /test/, @values + }, }; __END__ Rate grep_loop for_loop grep_expr grep_loop 248507/s -- -27% -46% for_loop 338839/s 36% -- -27% grep_expr 461521/s 86% 36% --` [download] You owe the oracle a Battle Royale With Cheese and a tasty drink to wash it down with. The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re^4: Battle Royal Grep vs For by Herkum (Parson) on Mar 19, 2008 at 18:01 UTC