in reply to Re: When would you choose foreach instead of map?
in thread When would you choose foreach instead of map?

Is it possible to generate the result 'X' in such a way that I can automatically deduce the method that was used to generate that result (without looking at the source code) by evaluating one or more quantifiable measures of fitness?

Where 'X' is the sum of the numbers 1 .. 1_000_000.

p:\test>x1 500000500000 p:\test>x2 500000500000

The programs (in no particular order).

map{ $sum += $_ } 1 .. 1000000; print $sum;
and
$sum += $_ for 1 .. 1000000; print $sum;

External metric: One of them uses 3MB, the other 96MB :)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Re: When would you choose foreach instead of map?
by eric256 (Parson) on May 21, 2004 at 04:17 UTC
    I don't know how to check for memory usage (rather than the blunt method of watching the task manager) but here are some speed tests. It appears from this test that they are very simlar in speed for this summing operation. For is the winner but i'm not sure its a noticable portion until you climb into the severl hundered thousand range.
    use strict; use warnings; use Benchmark qw(:all) ; my $to = 10_000; cmpthese(-5, { 'Map100k' => sub { my $sum = 0; map { $sum += $_ } 1 .. 100_000; }, 'For100k' => sub { my $sum = 0; $sum += $_ for 1 .. 100_000; }, 'Map10k' => sub { my $sum = 0; map { $sum += $_ } 1 .. 10_000; }, 'For10k' => sub { my $sum = 0; $sum += $_ for 1 .. 10_000; }, 'Map1k' => sub { my $sum = 0; map { $sum += $_ } 1 .. 1_000; }, 'For1k' => sub { my $sum = 0; $sum += $_ for 1 .. 1_000; }, }); __END__ Rate Map100k For100k Map10k For10k Map1k For1k Map100k 18.6/s -- -19% -90% -92% -99% -99% For100k 23.1/s 24% -- -88% -90% -99% -99% Map10k 187/s 905% 709% -- -22% -91% -92% For10k 239/s 1182% 933% 28% -- -89% -90% Map1k 2077/s 11052% 8880% 1009% 770% -- -13% For1k 2387/s 12715% 10219% 1175% 899% 15% --

    ___________
    Eric Hodges
      Your benchmark isn't comparing for-block vs map. It's comparing the for statement modifier against map. Statements modifiers don't have the overhead of entering and leaving a scope. Of course, in this particular case, the clear winner is one that doesn't loop.
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; my $base = 1000; foreach my $mult (1, 10, 100) { my $code; my $max = $mult * $base; { no strict 'refs'; @{"main::array$mult"} = 1 .. $max; } my $map_var = "\$sum${mult}map"; my $for_var = "\$sum${mult}for"; my $mod_var = "\$sum${mult}mod"; my $exp_var = "\$sum${mult}exp"; $code -> {"map${mult}k"} = "$map_var = 0;" . "map {$map_var += \$_} \@array$mult"; $code -> {"for${mult}k"} = "$for_var = 0;" . "for (\@array$mult) {$for_var += \$_}"; $code -> {"mod${mult}k"} = "$mod_var = 0;" . "$mod_var += \$_ for \@array$mult"; $code -> {"exp${mult}k"} = "$exp_var = $max * ($max + 1) / 2;"; cmpthese -1 => $code; print "\n"; no strict 'refs'; die "Unequal\n" unless ${"sum${mult}map"} == ${"sum${mult}for"} && ${"sum${mult}mod"} == ${"sum${mult}exp"} && ${"sum${mult}map"} == ${"sum${mult}exp"}; } __END__ Rate map1k for1k mod1k exp1k map1k 1305/s -- -58% -64% -100% for1k 3140/s 141% -- -12% -100% mod1k 3589/s 175% 14% -- -100% exp1k 6859842/s 525617% 218353% 191047% -- Rate map10k for10k mod10k exp10k map10k 123/s -- -59% -66% -100% for10k 299/s 144% -- -16% -100% mod10k 356/s 190% 19% -- -100% exp10k 7021713/s 5717581% 2347785% 1970572% -- Rate map100k for100k mod100k exp100k map100k 12.3/s -- -60% -64% -100% for100k 30.5/s 148% -- -11% -100% mod100k 34.2/s 179% 12% -- -100% exp100k 5383313/s 43894609% 17663897% 15724842% --

      Abigail

        Thanks for the info on for as a modifier. I thought it was just kinda a shortcut and that the compiler produced a normal for block out of it. Good info and i'm sure we all new that an equation to calculate the total would be better/faster/smarter. We weren't benchmarking the math just the means of looping.

        ___________
        Eric Hodges
Re^3: When would you choose foreach instead of map?
by Anonymous Monk on May 21, 2004 at 05:39 UTC

    I thought map() was optimized for use in void context. I'd never use map() in such a case in this as for() is much more elegant (all IMO of course). Can you explain why the map() version takes up so much memory?

      I believe* that the reason is nothing to do with the void context. for is optimised to treat the range operator as an iterator, by which I mean it doesn't generate a list of 1_000_000 scalars, it simply supplies the next incremented value as $_ for each iteration.

      map on the other hand doesn't have this optimisation and so a very large list is generated as the input to map. This is what (briefly) consumes the memory. For short ranges this isn't a problem, but for larger ones, it's worth avoiding.

      * I'm fairly confident that this is the case, but it is acquired knowledge rather than something I can point my finger at the sources and say "There".


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
Re: Re: Re: When would you choose foreach instead of map?
by dimar (Curate) on May 21, 2004 at 16:49 UTC

    I am gonna go out on a limb and wildly speculate that if one of them uses 96MB and the tests are accurate, it was *map*, based on the whole 'side-effect' thingy. If true, this re-emphasizes the original point that this is the one (non-preference-based) difference.

      As always, don't take my word for it, do your own test. All the (two lines) of the (two lines) of the code is there. The only instrumentation required is a <STDIN>; to prevent the processes from exiting and top/task manager.

      Is this the "'side effect' thingy"? I guess you could classify it that way:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail