Herkum has asked for the wisdom of the Perl Monks concerning the following question:
#!/usr/bin/perl use strict; use warnings; my @values = qw(test test1 test2 test3); use Benchmark qw( cmpthese ); cmpthese -2, { for_loop => sub { my @matches; for (@values) { push @matches, $_ +if $_ =~ qr{test} }; }, grep_loop => sub { my @matches = grep { $_ =~ qr{test} } @values; + }, }; __END__ Rate grep_loop for_loop grep_loop 66674/s -- -10% for_loop 74136/s 11% -- This is perl, v5.8.8 built for i486-linux-gnu-thread-multi
I was a little suprised by this and I was wondering if maybe there was something else I was missing. Any comments?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Battle Royal Grep vs For
by kyle (Abbot) on Mar 19, 2008 at 18:07 UTC | |
I think your data set is way too small (four elements), and you don't account for differences you might get from matching a different number of times. When benchmarking a list operation, try it with a long list! If you're testing some kind of selection method, see how it works against different proportions of matches. Which is faster when there's nothing to find? Which is faster when everything matches? I tried these out, and I was a little surprised at the results. Basically grep wins when nothing matches, but for wins when everything matches. Mostly, though, I'm not sure this matters. A difference as small as that is likely just noise. After all, when I ran this a second time, for came out ahead in the non-matching scenario.
| [reply] [d/l] [select] |
by driver8 (Scribe) on Mar 19, 2008 at 19:44 UTC | |
Just thought I'd muddy the waters a bit more, for fun: I skipped "my @values = @{shift()};" in all of them, since this array copy seemed to be a big part of the run time. The point I would take from this is that the performance really depends more on the situation you use them in and how you use them. As kyle says, it's probably best to just use the one that is clearest. For me, that would be fastgrep_loop.
-driver8
| [reply] [d/l] [select] |
by kyle (Abbot) on Mar 19, 2008 at 20:37 UTC | |
You're right, copying does take time, but I'm not sure that's really what's going on here. I tested some more myself. Read more... (3 kB)
What this is meant to show is really that grep performs differently depending on context. Benchmark::cmpthese calls the subs you give it in void context, so if you want another context for your code, you need to provide it. This is part of why the subs I wrote are careful to actually collect (and return) results. That's what these would do in the real world. In void context, grep doesn't bother to make a list, so it's a lot faster. It's also faster when I forced a scalar context because it again doesn't make a list. In that case, it just counts the matches and passes out the number. You can see that the void and scalar cases are far ahead of all the others. The grep in list context without a copy is only slightly faster than the one that makes a copy. Basically it's in the ballpark of copying while all the others (in void and scalar contexts) are in another ballpark. | [reply] [d/l] [select] |
by Herkum (Parson) on Mar 19, 2008 at 18:21 UTC | |
I thought about a larger data set, but it did not occur to me to change the number of potential matches. Your tests certainly muddy up the waters a bit, because now I would have to think about the number of expected matches to determine which one to use in certain situations. | [reply] |
by kyle (Abbot) on Mar 19, 2008 at 18:35 UTC | |
Actually, my personal opinion is that you should always use the one you find the most comprehensible. In this case, I find grep the easier to understand because it's being used as a filter. In a more complicated real world example, a loop might make more sense, especially if the filtering requires a lot more than a regular expression. It's not time to worry about which one is faster until the one you're using is too slow. Profile your code first and then look for speed improvements in the places that matter. Figuring out the fastest way to do something for its own sake has entertainment value, but it shouldn't be how you decide to write something. Write for those who will read it (and by this, I do not mean the interpreter). Besides all that, in this case, I think testing shows there isn't a significant speed difference. The biggest difference between identical data sets was in the 100 case, and even there the difference was only 22%. | [reply] |
|
Re: Battle Royal Grep vs For
by runrig (Abbot) on Mar 19, 2008 at 17:32 UTC | |
| [reply] [d/l] [select] |
by Herkum (Parson) on Mar 19, 2008 at 17:47 UTC | |
Your point is taken, so I reran the tests and got this,
So now they are both faster, but grep is still slower. Than the for loop | [reply] [d/l] |
by Fletch (Bishop) on Mar 19, 2008 at 17:54 UTC | |
And as usual grep EXPR, LIST that everyone forgets about wins the race.
You owe the oracle a Battle Royale With Cheese and a tasty drink to wash it down with.
The cake is a lie. | [reply] [d/l] [select] |
by Herkum (Parson) on Mar 19, 2008 at 18:01 UTC | |