in reply to Re^2: Making a hash with lists
in thread Making a hash with lists

Ew. Copying both arrays to avoid destroying them is horribly inefficient. As is building an array only to return it as a list

You can't judge code performance by eyeballing it. Let's try a simple measurement instead.

Here's my benchmark code (hopefully without any embarrassing mistakes! unfortunately with an embarrassing mistake):

#! /usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use List::MoreUtils 'zip'; # Hands up if you can predict the result? : +) sub zip1 { my @list1 = @{shift()}; my @list2 = @{shift()}; my @zip; while (@list1+@list2) { push(@zip, shift(@list1), shift(@list2)); } @zip } sub zip2 { my( $r1, $r2 ) = @_; map { $r1->[ $_ ], $r2->[ $_ ] } 0 .. ( $#$r1 > $#$r2 ? $#$r1 : $#$r2 ); } for my $length (10, 1000, 10000000) { my @a = 1 .. $length; my @b = reverse @a; print "For $length elements:\n"; cmpthese(-10, { copy => 'zip1 \@a, \@b', deref => 'zip2 \@a, \@b', CPAN => 'zip @a, @b', }); print "\n"; }

And here are the (misleading) results:

For 10 elements: Rate deref copy CPAN deref 672475/s -- -15% -67% copy 795069/s 18% -- -62% CPAN 2066450/s 207% 160% -- For 1000 elements: Rate deref copy CPAN deref 671925/s -- -14% -68% copy 781043/s 16% -- -62% CPAN 2071516/s 208% 165% -- For 10000000 elements: Rate deref copy CPAN deref 653053/s -- -15% -68% copy 771081/s 18% -- -62% CPAN 2045217/s 213% 165% --

So, we have one unsurprising result: optimised, tested, standard CPAN code (List::MoreUtils::zip) is significantly more efficient than reinventing the wheel. :)

And we have a surprising result, too: copying and shifting may look less efficient than dereferencing, but in practice it's a good 15% faster -- in this particular instance, on my particular computer (Linux 2.6, x86_64, Perl 5.8.8), that is.

Wait, have I got this right? It seems odd that the speed changes so little as I change the length of the lists. Hmm.

Update: yup, I was wrong; blasted scoping rules hiding my arrays from the evalled code. BrowserUk's corrected benchmark (below) shows that the dereferencing version is significantly faster than the copying version, and its advantage increases as the arrays grow. However, List::MoreUtils::zip does still outperform both, particularly on shorter arrays.

Replies are listed 'Best First'.
Re^4: Making a hash with lists
by BrowserUk (Patriarch) on Jun 28, 2008 at 13:58 UTC

    Try this variation of your benchmark:

    #! perl use strict; use warnings; use Benchmark 'cmpthese'; use List::MoreUtils 'zip'; # Hands up if you can predict the result? : +) sub zip1 { my @list1 = @{shift()}; my @list2 = @{shift()}; my @zip; while (@list1+@list2) { push(@zip, shift(@list1), shift(@list2)); } @zip } sub zip2 { my( $r1, $r2 ) = @_; map { $r1->[ $_ ], $r2->[ $_ ] } 0 .. ( $#$r1 > $#$r2 ? $#$r1 : $#$r2 ); } for my $length (10, 1000, 10000, 100000) { my @a = 1 .. $length; my @b = reverse @a; print "For $length elements:\n"; cmpthese(-3, { copy => sub{ my @res = zip1( \@a, \@b ) }, deref => sub{ my @res = zip2( \@a, \@b ) }, CPAN => sub{ my @res = zip( @a, @b ) }, }); print "\n"; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Oops. I thought those results looked fishy. Thanks; I've learnt something today.

      (This must be quite a common mistake to make. I would have expected the "strict" pragma to pick up the scoping error, but it looks like 'strict' isn't active within the evalled code by default.)

        but it looks like 'strict' isn't active within the evalled code by default

        By default it isn't, due to this piece of nonsense from Benchmark.pm:

        # evaluate something in a clean lexical environment sub _doeval { no strict; eval shift }

        If you remove the no strict; as I have, then you'd have got:

        For 10 elements: runloop unable to compile ' my @res = zip( @a, @b ) ': Global symbol "@a" requires explicit package name at (eval 4) line 1. Global symbol "@b" requires explicit package name at (eval 4) line 1. code: sub { for (1 .. 1) { local $_; package main; my @res = zip( @ +a, @b ) ;} } at c:\test\junk9.pl line 29

        Which seems far more useful than the current behaviour. I'm not sure what the author hoped to achieve by adding it?

        The other thing to note in my version is the assignment of the result. Without this, Perl may optimise away code that appears in a void context. In 5.10 for example, map called in a void context doesn't bother stacking the results which would further skew the benchmark.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.