in reply to Battle Royal Grep vs For

I think your data set is way too small (four elements), and you don't account for differences you might get from matching a different number of times.

When benchmarking a list operation, try it with a long list! If you're testing some kind of selection method, see how it works against different proportions of matches. Which is faster when there's nothing to find? Which is faster when everything matches?

I tried these out, and I was a little surprised at the results. Basically grep wins when nothing matches, but for wins when everything matches.

Mostly, though, I'm not sure this matters. A difference as small as that is likely just noise. After all, when I ran this a second time, for came out ahead in the non-matching scenario.

use List::Util qw( shuffle ); my @match_0 = gen_values( 0 ); my @match_25 = gen_values( 1/4 ); my @match_50 = gen_values( 1/2 ); my @match_75 = gen_values( 3/4 ); my @match_100 = gen_values( 1 ); sub gen_values { my $proportion = shift; my $count = 10_000; my $matching = $count * $proportion; my @out; push @out, gen_matching() while $matching-- > 0; push @out, gen_non_matching() while scalar @out < $count; return shuffle @out; } sub gen_matching { my $out = gen_non_matching(); substr( $out, rand length $out, 1 ) = 'x'; return $out; } sub gen_non_matching { my $out = ''; $out .= int rand 10 while length $out < 100; return $out; } use Benchmark qw( cmpthese ); cmpthese -2, { for_0 => sub { for_loop( \@match_0 ) }, for_25 => sub { for_loop( \@match_25 ) }, for_50 => sub { for_loop( \@match_50 ) }, for_75 => sub { for_loop( \@match_75 ) }, for_100 => sub { for_loop( \@match_100 ) }, grep_0 => sub { grep_loop( \@match_0 ) }, grep_25 => sub { grep_loop( \@match_25 ) }, grep_50 => sub { grep_loop( \@match_50 ) }, grep_75 => sub { grep_loop( \@match_75 ) }, grep_100 => sub { grep_loop( \@match_100 ) }, }; sub for_loop { my @values = @{shift()}; my @matches; for ( @values ) { push @matches, $_ if /x/; } return @matches; } sub grep_loop { my @values = @{shift()}; my @matches = grep /x/, @values; return @matches; } __END__ Rate grep_100 grep_75 for_100 grep_50 for_75 for_50 grep_2 +5 for_25 for_0 grep_0 grep_100 81.9/s -- -11% -18% -20% -25% -30% -31 +% -36% -42% -43% grep_75 92.0/s 12% -- -8% -10% -15% -21% -22 +% -28% -34% -36% for_100 100.0/s 22% 9% -- -2% -8% -14% -15 +% -22% -29% -30% grep_50 102/s 25% 11% 2% -- -6% -12% -13 +% -20% -27% -29% for_75 108/s 32% 18% 9% 6% -- -7% -8 +% -15% -23% -25% for_50 117/s 42% 27% 17% 14% 7% -- -1 +% -9% -17% -19% grep_25 118/s 44% 28% 18% 15% 9% 1% - +- -7% -16% -18% for_25 128/s 56% 39% 28% 25% 18% 9% 8 +% -- -9% -11% for_0 140/s 71% 52% 40% 37% 29% 20% 19 +% 10% -- -3% grep_0 144/s 76% 56% 44% 41% 33% 23% 22 +% 13% 3% --

Replies are listed 'Best First'.
Re^2: Battle Royal Grep vs For
by driver8 (Scribe) on Mar 19, 2008 at 19:44 UTC

    Just thought I'd muddy the waters a bit more, for fun:

    cmpthese -2, { for_0 => sub { for_loop( \@match_0 ) }, for_25 => sub { for_loop( \@match_25 ) }, for_50 => sub { for_loop( \@match_50 ) }, for_75 => sub { for_loop( \@match_75 ) }, for_100 => sub { for_loop( \@match_100 ) }, grep_0 => sub { grep_loop( \@match_0 ) }, grep_25 => sub { grep_loop( \@match_25 ) }, grep_50 => sub { grep_loop( \@match_50 ) }, grep_75 => sub { grep_loop( \@match_75 ) }, grep_100 => sub { grep_loop( \@match_100 ) }, fastgrep_0 => sub { fastgrep_loop( \@match_0 ) }, fastgrep_25 => sub { fastgrep_loop( \@match_25 ) }, fastgrep_50 => sub { fastgrep_loop( \@match_50 ) }, fastgrep_75 => sub { fastgrep_loop( \@match_75 ) }, fastgrep_100 => sub { fastgrep_loop( \@match_100 ) }, noretfor_0 => sub { noretfor_loop( \@match_0 ) }, noretfor_25 => sub { noretfor_loop( \@match_25 ) }, noretfor_50 => sub { noretfor_loop( \@match_50 ) }, noretfor_75 => sub { noretfor_loop( \@match_75 ) }, noretfor_100 => sub { noretfor_loop( \@match_100 ) }, noretgrep_0 => sub { noretgrep_loop( \@match_0 ) }, noretgrep_25 => sub { noretgrep_loop( \@match_25 ) }, noretgrep_50 => sub { noretgrep_loop( \@match_50 ) }, noretgrep_75 => sub { noretgrep_loop( \@match_75 ) }, noretgrep_100 => sub { noretgrep_loop( \@match_100 ) }, }; sub for_loop { my @matches; for ( @{$_[0]} ) { push @matches, $_ if /x/; } return @matches; } sub grep_loop { my @matches = grep /x/, @{$_[0]}; return @matches; } sub fastgrep_loop { return grep /x/, @{$_[0]}; } sub noretfor_loop { for (@{$_[0]}) { $_ =~ /x/ } } sub noretgrep_loop { grep /x/, @{$_[0]}; } __END__ (cropped to fit) Rate grep_100 grep_75 for_100 grep_50 for_75 for_50 gre +p_25 for_25 noretfor_75 grep_100 134/s -- -17% -29% -41% -42% -55% +-62% -68% -79% grep_75 161/s 20% -- -15% -29% -30% -47% +-55% -61% -75% for_100 189/s 41% 17% -- -16% -18% -37% +-47% -55% -71% grep_50 226/s 68% 40% 20% -- -2% -25% +-36% -46% -65% for_75 231/s 72% 43% 22% 2% -- -24% +-35% -45% -64% for_50 302/s 125% 87% 60% 33% 31% -- +-15% -28% -54% grep_25 356/s 165% 121% 88% 57% 54% 18% + -- -15% -45% for_25 419/s 212% 160% 121% 85% 82% 39% + 18% -- -36% noretfor_75 650/s 383% 303% 243% 187% 182% 115% + 82% 55% -- noretfor_100 680/s 406% 321% 259% 200% 195% 125% + 91% 62% 5% noretfor_50 693/s 416% 330% 266% 206% 200% 130% + 95% 66% 7% for_0 693/s 416% 330% 266% 206% 200% 130% + 95% 66% 7% noretfor_0 697/s 419% 332% 269% 208% 202% 131% + 96% 66% 7% noretgrep_50 700/s 421% 334% 270% 209% 204% 132% + 96% 67% 8% noretfor_25 701/s 421% 334% 271% 210% 204% 132% + 97% 67% 8% noretgrep_75 736/s 448% 356% 289% 225% 219% 144% +106% 76% 13% fastgrep_0 740/s 451% 359% 291% 227% 221% 145% +108% 77% 14% fastgrep_75 747/s 456% 363% 295% 230% 224% 147% +110% 78% 15% fastgrep_100 752/s 460% 366% 298% 232% 226% 149% +111% 80% 16% noretgrep_25 769/s 472% 376% 306% 240% 233% 154% +116% 84% 18% noretgrep_0 769/s 472% 377% 307% 240% 233% 155% +116% 84% 18% fastgrep_50 777/s 478% 381% 311% 243% 237% 157% +118% 85% 20% noretgrep_100 781/s 481% 384% 313% 245% 238% 159% +119% 86% 20% grep_0 796/s 492% 393% 321% 252% 245% 164% +123% 90% 23% fastgrep_25 801/s 496% 396% 323% 254% 247% 165% +125% 91% 23%
    I skipped "my @values = @{shift()};" in all of them, since this array copy seemed to be a big part of the run time. The point I would take from this is that the performance really depends more on the situation you use them in and how you use them. As kyle says, it's probably best to just use the one that is clearest. For me, that would be fastgrep_loop.
    -driver8

      You're right, copying does take time, but I'm not sure that's really what's going on here. I tested some more myself.

      What this is meant to show is really that grep performs differently depending on context.

      Benchmark::cmpthese calls the subs you give it in void context, so if you want another context for your code, you need to provide it. This is part of why the subs I wrote are careful to actually collect (and return) results. That's what these would do in the real world.

      In void context, grep doesn't bother to make a list, so it's a lot faster. It's also faster when I forced a scalar context because it again doesn't make a list. In that case, it just counts the matches and passes out the number. You can see that the void and scalar cases are far ahead of all the others.

      The grep in list context without a copy is only slightly faster than the one that makes a copy. Basically it's in the ballpark of copying while all the others (in void and scalar contexts) are in another ballpark.

Re^2: Battle Royal Grep vs For
by Herkum (Parson) on Mar 19, 2008 at 18:21 UTC

    I thought about a larger data set, but it did not occur to me to change the number of potential matches.

    Your tests certainly muddy up the waters a bit, because now I would have to think about the number of expected matches to determine which one to use in certain situations.

      Actually, my personal opinion is that you should always use the one you find the most comprehensible. In this case, I find grep the easier to understand because it's being used as a filter. In a more complicated real world example, a loop might make more sense, especially if the filtering requires a lot more than a regular expression.

      It's not time to worry about which one is faster until the one you're using is too slow. Profile your code first and then look for speed improvements in the places that matter.

      Figuring out the fastest way to do something for its own sake has entertainment value, but it shouldn't be how you decide to write something. Write for those who will read it (and by this, I do not mean the interpreter).

      Besides all that, in this case, I think testing shows there isn't a significant speed difference. The biggest difference between identical data sets was in the 100 case, and even there the difference was only 22%.