Re^3: Making a hash with lists

Ew. Copying both arrays to avoid destroying them is horribly inefficient. As is building an array only to return it as a list

You can't judge code performance by eyeballing it. Let's try a simple measurement instead.

Here's my benchmark code (~~hopefully without any embarrassing mistakes!~~ unfortunately with an embarrassing mistake):

#! /usr/bin/perl

use strict;
use warnings;
use Benchmark 'cmpthese';
use List::MoreUtils 'zip'; # Hands up if you can predict the result? :
+)

sub zip1 {
  my @list1 = @{shift()};
  my @list2 = @{shift()};
  my @zip;
  while (@list1+@list2) {
    push(@zip, shift(@list1), shift(@list2));
  }
  @zip
}

sub zip2 {
    my( $r1, $r2 ) = @_;
    map {
        $r1->[ $_ ], $r2->[ $_ ]
    } 0 .. ( $#$r1 > $#$r2 ? $#$r1 : $#$r2 );
}

for my $length (10, 1000, 10000000) {
    my @a = 1 .. $length;
    my @b = reverse @a;
    print "For $length elements:\n";
    cmpthese(-10, {
        copy  => 'zip1 \@a, \@b',
        deref => 'zip2 \@a, \@b',
        CPAN  => 'zip   @a,  @b',
    });
    print "\n";
}
[download]

And here are the (misleading) results:

For 10 elements:
           Rate deref  copy  CPAN
deref  672475/s    --  -15%  -67%
copy   795069/s   18%    --  -62%
CPAN  2066450/s  207%  160%    --

For 1000 elements:
           Rate deref  copy  CPAN
deref  671925/s    --  -14%  -68%
copy   781043/s   16%    --  -62%
CPAN  2071516/s  208%  165%    --

For 10000000 elements:
           Rate deref  copy  CPAN
deref  653053/s    --  -15%  -68%
copy   771081/s   18%    --  -62%
CPAN  2045217/s  213%  165%    --
[download]

So, we have one unsurprising result: optimised, tested, standard CPAN code (List::MoreUtils::zip) is significantly more efficient than reinventing the wheel. :)

And we have a surprising result, too: copying and shifting may look less efficient than dereferencing, but in practice it's a good 15% faster -- in this particular instance, on my particular computer (Linux 2.6, x86_64, Perl 5.8.8), that is.

Wait, have I got this right? It seems odd that the speed changes so little as I change the length of the lists. Hmm.

Update: yup, I was wrong; blasted scoping rules hiding my arrays from the evalled code. BrowserUk's corrected benchmark (below) shows that the dereferencing version is significantly faster than the copying version, and its advantage increases as the arrays grow. However, List::MoreUtils::zip does still outperform both, particularly on shorter arrays.

Comment on Re^3: Making a hash with lists Select or Download Code

Replies are listed 'Best First'.
Re^4: Making a hash with lists by BrowserUk (Patriarch) on Jun 28, 2008 at 13:58 UTC
Try this variation of your benchmark: #! perl use strict; use warnings; use Benchmark 'cmpthese'; use List::MoreUtils 'zip'; # Hands up if you can predict the result? : +) sub zip1 { my @list1 = @{shift()}; my @list2 = @{shift()}; my @zip; while (@list1+@list2) { push(@zip, shift(@list1), shift(@list2)); } @zip } sub zip2 { my( $r1, $r2 ) = @_; map { $r1->[ $_ ], $r2->[ $_ ] } 0 .. ( $#$r1 > $#$r2 ? $#$r1 : $#$r2 ); } for my $length (10, 1000, 10000, 100000) { my @a = 1 .. $length; my @b = reverse @a; print "For $length elements:\n"; cmpthese(-3, { copy => sub{ my @res = zip1( \@a, \@b ) }, deref => sub{ my @res = zip2( \@a, \@b ) }, CPAN => sub{ my @res = zip( @a, @b ) }, }); print "\n"; } [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^5: Making a hash with lists by Porculus (Hermit) on Jun 28, 2008 at 16:29 UTC
Oops. I thought those results looked fishy. Thanks; I've learnt something today. (This must be quite a common mistake to make. I would have expected the "strict" pragma to pick up the scoping error, but it looks like 'strict' isn't active within the evalled code by default.)	[reply]
Re^6: Making a hash with lists by BrowserUk (Patriarch) on Jun 28, 2008 at 19:08 UTC
but it looks like 'strict' isn't active within the evalled code by default By default it isn't, due to this piece of nonsense from Benchmark.pm: `# evaluate something in a clean lexical environment sub _doeval { no strict; eval shift }` [download] If you remove the `no strict;` as I have, then you'd have got: `For 10 elements: runloop unable to compile ' my @res = zip( @a, @b ) ': Global symbol "@a" requires explicit package name at (eval 4) line 1. Global symbol "@b" requires explicit package name at (eval 4) line 1. code: sub { for (1 .. 1) { local $_; package main; my @res = zip( @ +a, @b ) ;} } at c:\test\junk9.pl line 29` [download] Which seems far more useful than the current behaviour. I'm not sure what the author hoped to achieve by adding it? The other thing to note in my version is the assignment of the result. Without this, Perl may optimise away code that appears in a void context. In 5.10 for example, map called in a void context doesn't bother stacking the results which would further skew the benchmark. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]