mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, PerlMonks!

I'm trying to figure out if foreach is faster than grep with map. I've replaced grep and map in a large script with foreach and noticed a substantial speed improvement, so my guess is that it is faster. However, I'd like to verify that somehow.

With the two scripts below, am I comparing the same things and not apple to oranges? Or have I misunderstood grep and map?

#!/usr/bin/perl use strict; use warnings; my @c = (1 .. 10000000); my @d; foreach my $dd (@c) { push(@d, $dd % 2); } my @e; foreach my $ee (@d) { if (!$ee) { push(@e, $ee); } } exit(0);

The script above is much faster than the one below, according to time.

#!/usr/bin/perl use strict; use warnings; my @c = (1 .. 10000000); my @d = map( $_%2, @c); my @e = grep(/0/, @d); exit(0);

Replies are listed 'Best First'.
Re: Speed comparison of foreach vs grep + map
by Corion (Patriarch) on May 25, 2025 at 18:01 UTC

    Without looking at your timings, the second script is doing something different than the first script:

    my @e = grep(/0/, @d);

    This invokes the regex engine instead of comparing $_ to falsy values.

    Maybe you meant to write something like:

    my @e = grep(!$_, @d);
Re: Speed comparison of foreach vs grep + map
by ikegami (Patriarch) on May 25, 2025 at 18:28 UTC

    As Corion pointed out, the switch from !$_ to /0/ is overpowering the other differences and invalidating your test.

    But other improvements can be made.


    There's no reason to use an intermediate array in the latter snippet. This avoids the creation of 5 million scalars.

    my @e = grep !$_, map $_ % 2, @c;

    You could even eliminate the grep.

    my @e = map $_ % 2 ? () : 0, @c;

    Similar improvements can be made for the foreach version.

    my @e; for my $c ( @c ) { my $d = $c % 2; push @e, $d if !$d; }
    my @e; for my $c ( @c ) { push @e, 0 if !( $c % 2 ); }

    Finally, I'm curious how these compare:

    my @e = ( 0 ) x grep !( $_ % 2 ), @c;
    my @e; push @e, 0 for 1 .. grep !( $_ % 2 ), @c;

      Benchmarks:

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my @c = 1 .. 1_000_000; cmpthese( -3, { for => sub { my @e; for my $c ( @c ) { my $d = $c % 2; push @e, $d if !$d; } }, for2 => sub { my @e; for my $c ( @c ) { push @e, 0 if !( $c % 2 ); } }, map_grep => sub { my @e = grep !$_, map $_ % 2, @c; }, map => sub { my @e = map $_ % 2 ? () : 0, @c; }, grep_x => sub { my @e = ( 0 ) x grep !( $_ % 2 ), @c; }, grep_for => sub { my @e; push @e, 0 for 1 .. grep !( $_ % 2 ), @c; }, } );
      Rate map_grep for grep_for map for2 grep_x map_grep 10.6/s -- -27% -43% -47% -51% -54% for 14.6/s 38% -- -21% -27% -33% -36% grep_for 18.4/s 75% 27% -- -7% -15% -20% map 19.9/s 88% 36% 8% -- -9% -13% for2 21.7/s 106% 49% 18% 9% -- -5% grep_x 22.9/s 117% 57% 24% 15% 6% -- Rate map_grep for grep_for map for2 grep_x map_grep 12.2/s -- -16% -34% -40% -45% -49% for 14.6/s 20% -- -22% -29% -35% -39% grep_for 18.6/s 52% 27% -- -9% -17% -23% map 20.5/s 68% 40% 10% -- -8% -15% for2 22.4/s 83% 53% 20% 9% -- -7% grep_x 24.1/s 97% 65% 29% 17% 8% -- Rate map_grep for grep_for map for2 grep_x map_grep 12.1/s -- -19% -31% -36% -45% -49% for 14.8/s 23% -- -15% -21% -32% -37% grep_for 17.4/s 44% 17% -- -8% -21% -27% map 18.8/s 56% 27% 8% -- -14% -21% for2 21.9/s 82% 48% 26% 16% -- -8% grep_x 23.7/s 97% 60% 37% 26% 8% --
Re: Speed comparison of foreach vs grep + map
by GrandFather (Saint) on May 25, 2025 at 21:23 UTC

    Do you have a benchmarked application where the benchmark is showing that foreach/grep,map is the specific bottleneck and causing tens of seconds of extra run time? Because if not, choose the best tool to make the code as easy as possible to understand.

    You will usually save a lot more time writing maintainable code than seeking the fastest runtime you can manage.

    If you do need to improve runtime you should try to be smart rather than clever. A smart solution is likely to approach the problem from an entirely different direction. A clever solution is likely to be a nasty mess arrived at by iterative tweaking the code to squeeze the last drop of performance out of a sub-optimum algorithm.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

      If you do need to improve runtime you should try to be smart rather than clever. A smart solution is likely to approach the problem from an entirely different direction. A clever solution is likely to be a nasty mess arrived at by iterative tweaking the code to squeeze the last drop of performance out of a sub-optimum algorithm.

      -- GrandFather

      Love it! ... so much that I could not restrain myself from adding it to on Code Optimization and Performance References. :-)

      BTW, my favourite quote from that node is from Michael Abrash:

      Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations -- a fast slow program.

      👁️🍾👍🦟
      Do you have a benchmarked application where the benchmark is showing that foreach/grep,map is the specific bottleneck and causing tens of seconds of extra run time?

      I am processing some medium-sized files (1.6 MB to 2.5 MB etc) and notice that the run time goes up from ~ 3 s to 5+ s with grep and map compared to foreach. However, I see now that I can tune things better.

Re: Speed comparison of foreach vs grep + map
by NERDVANA (Priest) on May 26, 2025 at 17:26 UTC

    For large arrays, yes grep and map are slower than iterating the array. If you really get down into the weeds of perl performance, you'll find that any curly brace scope slows things down as well, so grep { $condition } @list is slower than grep $condition, @list.

    But, the other posts here are the important bit - focus on optimizing your algorithm before you try to squeeze a few extra percent out of the operators you use. Knowing that map and grep are a bit slower doesn't stop me from using them in my everyday code, because most of my (primarily webapp) code is author-speed limited, not cpu-speed limited. In the rare cases where I want something to run faster, I whip out Devel::NYTProf and then look for the hotspots. Even then, most of the time the report clues me into problems with my algorithm, and I don't need to optimize expressions.

Re: Speed comparison of foreach vs grep + map
by jwkrahn (Abbot) on May 26, 2025 at 03:31 UTC
    $ perl -le' use warnings; use strict; my @c = 1 .. 10_000_000; print @c . " @c[0..19]"; my @d; for my $dd ( @c ) { push @d, $dd % 2; } print @d . " @d[0..19]"; my @e; for my $ee ( @d ) { if ( !$ee ) { push @e, $ee; } } print @e . " @e[0..19]"; ' 10000000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 10000000 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 50000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    $ perl -le' use warnings; use strict; my @c = 1 .. 10_000_000; print @c . " @c[0..19]"; my @d = map $_%2, @c; print @d . " @d[0..19]"; my @e = grep /0/, @d; print @e . " @e[0..19]"; ' 10000000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 10000000 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 5000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    The quickest way to do what you want:

    $ perl -le' my @c = 1 .. 10_000_000; print @c . " @c[0..19]"; my @e = ( 0 ) x ( @c / 2 ); print @e . " @e[0..19]"; ' 10000000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    Naked blocks are fun! -- Randal L. Schwartz, Perl hacker

      It's ...odd to assume the array contains 1..N for the last one, but not for the first two.

Re: Speed comparison of foreach vs grep + map
by LanX (Saint) on May 26, 2025 at 16:47 UTC
    I think this is an XY problem

    You seem to be processing files, those operations are in general magnitudes slower than any loop constructs.

    For instance slurping instead of iterating can make a huge difference.

    So I hope you are not comparing apples with oranges.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Sorry for the delay.

      I do read in the whole file at once and pass it as a variable to XML::Twig and make three consecutive passes. Here are the first, second, and final passes. The first pass just collects some information, the second pass actually starts processing the elements. The are a few map and grep functions used in the various handlers (not listed):

      ... my $xml = XML::Twig->new( pretty_print => 'nsgmls', # nsgmls for parsability output_encoding => 'UTF-8', twig_roots => { 'office:automatic-styles' => 1 }, twig_handlers => { 'style:style[@style:family="text"]/style:text-properties' => \ +&handler_style_collector, 'style:style' => \&handler_paragraph_style_collector, }, ); # $content is not saved from the first pass, it only builds some hashe +s $xml->parse($content); $xml->dispose; $xml = XML::Twig->new( pretty_print => 'nsgmls', # nsgmls for parsability output_encoding => 'UTF-8', twig_roots => { 'office:body' => 1 }, twig_handlers => { # link anchors (text:boomark) must be handled before # processing the internal links (text:a) '*[text:bookmark]' => \&handler_bookmark, 'text:note[@text:note-class="footnote"]/text:note-body' => \&handler_footnotes, 'text:note-citation' => \&handler_citation, # only some ki +nds 'text:span' => \&handler_span, # typographic markup 'text:list-item' => \&handler_list_item, # all lists becom +e unordered 'table:table-header-rows' => \&handler_table_header_rows, 'table:table-row' => \&handler_table_row, 'table:table' => \&handler_table, # primitive table supp +ort 'text:line-break' => \&handler_line_break, 'text:table-of-content' => sub { $_->delete }, 'text:index-body' => sub { $_->delete }, 'text:alphabetical-index' => sub { $_->delete }, }, ); $xml->parse($content); $content = $xml->sprint; $xml->dispose; $xml = XML::Twig->new( pretty_print => 'nsgmls', empty_tags => 'html', output_encoding => 'UTF-8', twig_roots => { 'office:body' => 1 }, twig_handlers => { # links (text:a) must be handled after the link targets (text: +bookmark) 'text:a' => \&handler_links, 'text:h' => \&handler_h, 'text:p' => \&handler_p, 'draw:frame' => \&handler_draw_frame, 'office:annotation' => sub { $_->delete }, 'office:annotation-end' => sub { $_->delete }, 'text:sequence-decls' => sub { $_->delete }, 'text:tracked-changes' => sub { $_->delete }, 'text:table-of-content' => sub { $_->delete }, 'office:forms' => sub { $_->delete }, 'text:list' => \&handler_lift_up, 'text:section' => \&handler_lift_up, 'office:body' => \&handler_lift_up, 'office:text' => \&handler_lift_up, }, ); $xml->parse($content); $content = $xml->sprint; $xml->dispose; . . .

      The first pass is necessary to collect some information about typographical markup before actual processing begins. Then there are some manipulations which have to be kept separate for the second and third passes, but that helps because spreading some of the other handlers across the two passes seems to speed up processing.

      Then there is some regex tidying of the markdown stored in $content afterward. It seemed best to do that as regex.

      It's a bit moot at this point, though: While the script does the job quite nicely on the use-cases I've tried it on, the person I wrote it for has access to additional data (for his specific use-case) so he was inspired to write a second version which is more tightly couple with that data.