radiantmatrix has asked for the wisdom of the Perl Monks concerning the following question:

I've been doing some benchmarking, mostly to learn how to use Benchmark, but also to test various intensive operations in some production code, and try to streamline them.

One of my bottlenecks involves code that copies a large array and then makes substitutions (via s///) to each element in the new copy. The code in question utilizes an unusal way of calling map{} to accomplish this (see the mapn in code). I came up with two alternative ways to do the same thing, and benchmarked them with surprising results.

#!/usr/bin/perl -w use strict; my @elem = (0..100_000); use Benchmark ':all'; cmpthese ( 100, { 'mapn' => sub { my @e = @elem; @e = map { $_ if (s/0/./g || 1) } @ +e }, 'for ' => sub { my @e = @elem; for (@e) { s/0/./g } }, 'mapc' => sub { my @e = map { $_ if (s/0/./g || 1) } @elem }, });

Results:

        Rate mapn mapc for
mapn 0.703/s   -- -41% -63%
mapc  1.19/s  70%   -- -37%
for   1.89/s 170%  59%   --

It is no surprise that both 'mapc' and 'for' tests are faster than 'mapn'. What surprised me was that 'for' was faster than 'mapc', mostly because I don't understand why. I ran the benchmark with arrays of various sizes, and with several different complexities of regex: the results are always extremely similar.

How does map{} differ from for(){}, and when is map faster (if ever)?


The Eightfold Path: 'use warnings;', 'use strict;', 'use diagnostics;', perltidy, CGI or CGI::Simple, try the CPAN first, big modules and small scripts, test first.

Replies are listed 'Best First'.
Re: When should I use map, for?
by polettix (Vicar) on May 19, 2005 at 22:29 UTC
    Just for curiosity, I tried Algorithm::Loops as well. The results seem to indicate that if you want to go for readability you'd better use it, if you really need speed it's better to roll your own. Bewaring of the pitfalls.
    #!/usr/bin/perl -w use strict; my @elem = (0..100_000); use Benchmark ':all'; use Algorithm::Loops 'Filter'; cmpthese ( 100, { 'mapn' => sub { my @e = @elem; @e = map { $_ if (s/0/./g || 1) } @e + }, 'for ' => sub { my @e = @elem; for (@e) { s/0/./g } }, 'mapc' => sub { my @e = map { $_ if (s/0/./g || 1) } @elem }, 'Filter' => sub { my @e = Filter { s/^\s+// } @elem }, }); __END__ ~/sviluppo/perl> ./loops.pl Rate Filter mapn mapc for Filter 4.00/s -- -1% -39% -53% mapn 4.06/s 1% -- -38% -52% mapc 6.57/s 64% 62% -- -23% for 8.54/s 113% 110% 30% --
    Indeed, the implementation of Filter resembles that of for, but its generality has two drawbacks:
    • There are two copies, one entering the function and one upon exit
    • the provided subroutine reference gets called, wich IMHO should trigger some penalty.
    Just my 2c.

    Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

    Don't fool yourself.
Re: When should I use map, for?
by Transient (Hermit) on May 19, 2005 at 20:21 UTC
    What version of perl are you using? I believe map was optimized recently. Previously it was slower than foreach.

    generally speaking, I use map if I care about the return array, foreach if I don't (void context)

    Perhaps you could try tr/0/./? I believe that should be faster than s///.
      I am using 5.8.6, so the delay shouldn't be a "recent fix" thing. Also, while tr/// would work for this particular case, the live-code version is not a transliteration.

      The Eightfold Path: 'use warnings;', 'use strict;', 'use diagnostics;', perltidy, CGI or CGI::Simple, try the CPAN first, big modules and small scripts, test first.

        Understood. I've noticed similar benchmark results when I ran your test. Perhaps what I am vaguely recalling is a "narrowing of the gap", so to speak.

        I refer back to my original post about when to use map vs. foreach. In this case, where it is executing many many times, it will make a difference. However, in the bulk of my coding, I use one or the other when it makes sense (primarily in a list vs. void context) and worry about performance later. I don't believe there is a time where, all other things being equal, map will be faster than foreach. I could be completely off the mark on this, however.
Re: When should I use map, for?
by fishbot_v2 (Chaplain) on May 19, 2005 at 20:56 UTC

    So, I might be mistaken here, but:

    sub { my @e = map { $_ if (s/0/./g || 1) } @elem }

    modifies global @elem in place. The s///g acts on $_, which is an alias to the list item. After the first iteration of mapc, you've altered the sample data. Not sure that matters, as this is a /g it has to inspect all the characters regardless.

    As has been mentioned already, the map versions both have two needless logical branches, which taints the results.

    A better comparison might be something that doesn't act in-place. Like an addition:

    my @b = map { $_ + 2 } @elem; # vs my @b; for ( @elem ) { push @b, $_ + 2; } # -> results: # Rate map for # map 60.3/s -- -23% # for 78.2/s 30% --

      That particular map does not modify @elem in-place. Rather, it copies one element at a time, leading to the same result. The needless logical branch isn't, because map would otherwise return a list of the results of the s///.

      I'm aware that the extra logical branch adds a small amount of complexity; However, the || short-circuts, so if a substitute is made, the '1' isn't evalated. Also, the '1' doesn't take a lot of time to evaluate. It's also needed to accomplish the substitution in that way. Even in your results the for is notably faster -- so your point seems moot anyhow.

      When is map more appropriate (faster) than for, and vice-versa?


      The Eightfold Path: 'use warnings;', 'use strict;', 'use diagnostics;', perltidy, CGI or CGI::Simple, try the CPAN first, big modules and small scripts, test first.

        Have you tried it? It copies -and- modifies -both-.

        my @elem = ( 1..20 ); my @e = map { $_ if (s/0/./g || 1) } @elem; print join( " ", @elem ), "\n"; __END__ 1 2 3 4 5 6 7 8 9 1. 11 12 13 14 15 16 17 18 19 2.

        I understand -why- you put the logical path there. Just saying s/0/./g; $_; makes more sense than the if and or.

        But you are quite right. It is moot. map is still slower, though arguably clearer in the $_ + 2 case. I don't know that map {} is ever an optimization.

        I would use grep to filter records, or map to preprocess going into a sort. I find that makes sense to me coding-wise. I doubt that it is better speed-wise.

        addendum:

        Okay, here is a case where I think map is cleaner and (slightly) faster:

        my @elem = map { int rand 1000 } 1..10000; sub map_s { my @mod_sorted = sort { $a <=> $b } map { $_ % 2**6 } @elem; } sub for_s { my @a; for ( @elem ) { push @a, $_ % 2**6; } my @mod_sorted = sort { $a <=> $b } @a; } __END__ Rate for map for 15.8/s -- -4% map 16.5/s 4% --

Re: When should I use map, for?
by mrborisguy (Hermit) on May 19, 2005 at 20:21 UTC

    Is this fair? Why do you include a || 1 in the map versions, but not the for version?

    -Bryan

      Since map returns the value in the BLOCK, there would be differing results if the OP didn't do this.

      Although, to be completely fair, you could do:
      s/0/./g; $_;

        Thanks for the clarification!

        -Bryan