in reply to What is faster?

One situation where I might use an external grep utility in preference to the internal one is if the script produces large volumes of ouput, and the selection process filters out a large proportion of it. The difference in pure speed terms is likely to be minimal, but the reduction in memory usage by not loading data just to discard it might be worth having.

That said, the amount of memory used by the internal version could be minimised by applying the grep at input rather than afterwards. Ie.

Update: DO NOT USE THE CODE BELOW!! Good idea, bad implementation as pointed out below by tilly

open(OUTPUT, "$script |"); my @output = grep { EXPRESSION } <OUTPUT>;

You'd probably need to be discarding a significant amount of input for this to make any great difference, but it probably wouldn't harm in any case, so why not do it anyway.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: Re: What is faster?
by l2kashe (Deacon) on Jun 02, 2003 at 14:20 UTC
    Here, here.. ;)

    Seriously though, I have found the internal grep to suit my needs very nicely when parsing files. I haven't really done much in terms of post processing of other commands. I was honestly amazed at the difference in speed between say
    open(IN, "/some/file") || die "Cant open /some/file: $!\n"; while (<IN>) { next unless (m/^$some_match/); chomp($capture = $_); } close(IN);
    as opposed to
    open(IN, "/some/file") || die "Cant open /some/file: $!\n"; chomp( ($capture) = grep(/^$some_match/, <IN>) ); close(IN);
    Especially as the size of the file being processed increases. Im not sure of the why of it, as I haven't gone poking around Perl's internals, but it certainly increased my regular useage of grep.

    MMMMM... Chocolaty Perl Goodness.....

      The obvious guess as to why grep is faster is that it's implemented in C. Any Perl built-in is going to be faster than the equivalent Perl code because it's compiled down to native machine code (and probably better optimized, to boot). It's the same reason why Perl's built-in lexical sort is faster than giving an explicit comparison routine (and why you're often better off munging the data before-hand so that you can use the built-in lexical sort; update: sometimes known as the Guttman Rosler Transform).

      That said, given your example use of while vs. grep and assuming you're interested in the first or only match (as in the while example), I would like to recommend the oft-overlooked List::Util function first. It's implemented in C, so it should be as fast as grep (or nearly so), and has the advantage of stopping when a match is found. This could potentially save a lot of file IO in your example. It also has the advantage of not building a return list which is just thrown away after getting the first item. List::Util is part of the standard distribution for 5.8.0 and should be an easy install on previous versions, as well. And there's several other useful functions in there as well, not to mention the compainion module Scalar::Util.

      bbfu
      Black flowers blossom
      Fearless on my breath

      l2kashe:

      Your two snippets don't seem to be equivalent. The first (while loop), when run on my dictionary file with the pattern /^zy/, sets $capture to 'zymurgy' (the last match), while the second (grep) sets it to 'zygote' (the first match).

      I understand why the while loop works how it does, perhaps someone could explain why grep works differently?

      TIA, dave

        Because the 2 code snippets arent quite the same. A better comparison would be
        open(IN, "/some/file") || die "Cant open /some/file: $!\n"; while (<IN>) { push(@foo, $_) if ( m/^$match/ ); } close(IN); # as opposed to open(IN, "/some/file") || die "Cant access /some/file: $!\n"; @foo = grep(m/^$match/, <IN>); close(IN);
        The differences in the first post between the loops are as follows:

        In the while loop, it continues to iterate over the input list, each time storing the value if it matches the regex. So if you have 3 values which match, then the last value matched will be placed in the variable.

        In the grep loop, due to my excessive use of parentheses I forced grep to return a list, but only captured the first element. Along the same lines as say
        $f = 'foo:bar:baz'; ($blah) = split(/:/, $f);
        $blah now contains 'foo', bar and baz are silently discarded. So grep finds all the matches, and returns them as a list, but I only grab the first value and toss the rest away. Use the first snippet in this post as a better example of the differences. Also remember that I stated on files themselves, not on directories, as grep really begins to out perform the loop as the data set gets larger. The loop gives you a far greater granularity of control in what to do with the contents of what you are iterating over. It will be more efficient to loop over a data set once in a loop if you are planning on doing different things with different pieces of the data. If you already know you want a specific piece of data, and only need to process the data set once, then grep is your friend.

        MMMMM... Chocolaty Perl Goodness.....
Re: Re: What is faster?
by tilly (Archbishop) on Jun 03, 2003 at 04:55 UTC
    The theory is right, the application less so.

    The <OUTPUT> construct immediately sucks the whole file into memory. To get the full space savings that you describe (not using variables will get you some...), you need to do something like this:

    open(OUTPUT, "$script |") or die "Cannot run '$script': $!"; my @output; while (<OUTPUT>) { push @output, $_ if /EXPRESSION/; }
    With the emphasis on creative laziness that they keep on talking about for Perl 6, it is possible that Perl 6 will automatically save memory for you with your construct. It is definite that Ruby does. But with Perl 5, you need to write things out longhand.

    UPDATE Zaxo pointed out that I didn't close the diamond properly. Fixed.

      Good point, crap implementation. Thanks for pointing it out.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller