Curious Perl Behavior...

I recently finished reading the new addition of "Effective Perl Programming: Ways to Write Better, More Idiomatic Perl", 2ed, by Joseph Hall, Joshua McAdama, and brian d foy. It is a throuhly delightful read (see my review in the "Reviews" section of the Perl Monks site) and I have begun going back through it to explore the multitude of good examples. I have run upon a curous behavior that seems to defy the wisdom in one place in the text.

The example is drawn from Chapter 4: Subroutines on Item 46: Pass references instead of copies. The example is as follows:

Passing Copies:

  sub sum {
     my @numbers = @_;
     my $sum = 0;
     foreach my $num (@numbers){ $sum += $num }
     $sum;
  }
  print sum( 1 .. 100_000 ),"\m";
[download]

Versus a version that passes references, a la:

  sub sum {
     my ($numbers_ref) = @_;
     my $sum = 0;
     foreach my $num (@$numbers_ref){ $sum += $num };
     $sum;
  }
  print sum([1..100_000]), "\n";
[download]

In the text the authors state that in the firs case "Perl has to copy 100,000 elements in @numbers in sum, although it doesn't change any of the values. Perl does a lot of work for no good reason...A small change in the subroutine fixes that." And that results in the second examle of sum.

So I coded each up and decided to see just how much difference it made. I used the module Time::HiRes using the following code:

#!/user/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday tv_interval);
sub sum {
   my ($numbers_ref) = @_;
   my $sum = 0;
   foreach my $num (@$numbers_ref){ $sum += $num };
   return $sum;
} # end sub sum()
my $t0 = [gettimeofday];
print sum([1..100_000]);
print "\n";
my $t1 = [gettimeofday];
my $elapsed = tv_interval($t0, $t1);
print "\n";
print "Elapsed time: $elapsed seconds\n";
exit(0);
[download]

and got the following results:

   First version of sub():    0.027007 seconds
   Second version of sub():   0.024075 seconds
[download]

So far, so good...exactly as the authors contended...not as dramatic a difference as I had imagined, but clearly consistent with the authors' position.

Ever the curious type, I decided to try a more dramatic case...I decided to try it with 1,000,000 numbers. When I did, I got the following unexpected result:

   First version of sum():   0.09366 seconds
   Second version of sum():  0.372829 seconds
[download]

The first part of the result seems logical to me since it uses more numbers and is appropriately greater (timewise) than the first test cases offered by the authors. But the performace of the second version of my 1,000,000 number case is most curious, unexpected and inexplicable to me: the pass-by-ref case takes almost 40 times LONGER than the pass-by-copies case! Anyone got some insight into why that happens?

For reference I am running on Windows XP (Version 2002 Service Pack 3) on an Intel Core 2 CPU 6300 running @ 1.86 GHz with 3.49 GBytes of 1.86 GHz RAM. I am using ActiveStates v5.8.8 Build 820 Perl.

UPDATE: Thanks everyone for the responses. Seems I wasn't timing what I thought I was timing. Thanks especilly to linuxer and BrowserUk for your help. I was inadvertantly timing the creation of the references rather than how long it took to process with and without them. Their results restored the truth and are consistent with what the authors' thesis was. Thanks, again, for the help.

ack Albuquerque, NM

Comment on Curious Perl Behavior... Select or Download Code

Replies are listed 'Best First'.
Re: Curious Perl Behavior... by BrowserUk (Patriarch) on May 25, 2010 at 21:51 UTC
The problem is that you are including the time taken to construct the original (anonymous) array. Try this and see how you get on (the timings shown are AS 5.10.1 under Vista 64-bit): #!/user/bin/perl use strict; use warnings; use Benchmark qw[ cmpthese ]; sub sum_copy { my @numbers = @_; my $sum = 0; foreach my $num (@numbers){ $sum += $num } $sum; } sub sum_ref { my ($numbers_ref) = @_; my $sum = 0; foreach my $num (@$numbers_ref){ $sum += $num }; $sum; } our $N //= 1e6; my @numbers = 1 .. $N; cmpthese -3, { copy => sub{ my $sum = sum_copy( @numbers ); }, ref => sub{ my $sum = sum_ref( \@numbers ); }, }; __END__ C:\test>junk28 -N=1e4 Rate copy ref copy 3.92/s -- -59% ref 9.67/s 146% -- C:\test>junk28 -N=1e5 Rate copy ref copy 3.95/s -- -59% ref 9.62/s 144% -- C:\test>junk28 -N=1e6 Rate copy ref copy 3.92/s -- -59% ref 9.66/s 146% -- [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^2: Curious Perl Behavior... by ack (Deacon) on May 26, 2010 at 15:23 UTC
This is consistent with what linuxer was saying, too. This helps me a lot and I like your approach, too. Thanks so much, again, for helping me to restore my sense of order in the world. My results were puzzling to me and left with a bit of virtigo. My quick look' was too quick and, as I noted in my response to linuxer, I forgot one of the lessons that our Benchmarking exist for a reason: they make it easy (easier) to do the job right. Again, many thanks BrowseUk. ack Albuquerque, NM	[reply]
Re: Curious Perl Behavior... by linuxer (Curate) on May 25, 2010 at 21:41 UTC
You are measuring the time for creating the array reference. Don't do: `print sum([1..100_000]);` Do something like this: #! /usr/bin/perl # vim:ts=4 sw=4 sts=4 et nu fdc=3: use strict; use warnings; use Benchmark qw( cmpthese ); #> sub routines #> ------------------------------------------------------------------- +--------- sub sum_list { my @numbers = @_; my $sum = 0; for my $num ( @numbers ) { $sum += $num; } return $sum; } sub sum_by_ref { my ( $numbers_ref ) = @_; my $sum = 0; for my $num ( @$numbers_ref ) { $sum += $num; } return $sum; } #> main script #> ------------------------------------------------------------------- +--------- my @numbers = 1 .. 1_000_000; cmpthese( -1, { 'sum_list' => sub { sum_list(@numbers); }, 'sum_by_ref' => sub { sum_by_ref(\@numbers); }, }); __END__ [download] Result: `Rate sum_list sum_by_ref sum_list 4.42/s -- -45% sum_by_ref 8.00/s 81% --` [download] I hope my point got clear ;o) Addendum: Tested with creating list and reference directly within function call: `cmpthese( -1, { 'sum_list' => sub { sum_list(1..1_000_000); }, 'sum_by_ref' => sub { sum_by_ref([1..1_000_000]); }, }); Result: Rate sum_list sum_by_ref sum_list 4.13/s -- -9% sum_by_ref 4.55/s 10% --` [download] edit: fixed missing "`1..`" in last snippet. Thanks ikegami. And fixed result output as well.	[reply] [d/l] [select]
Re^2: Curious Perl Behavior... by ack (Deacon) on May 26, 2010 at 15:17 UTC
Thanks, linuxer. I was trying to do it 'quick n dirty' and obviously did it too 'quick n dirty' without paying attention to what I was really doing. Your results make me feel more confident that I see what is going on and tought me (or re-tought me) a lesson that we talk about periodically in the Monestary: benchmarking is good...but do it right. The tools exist for a reason: to make it easy (or at least easier) to 'do it right'. I think I need to make a trip to the confessional. Again, thanks linuxer. ack Albuquerque, NM	[reply]
Re: Curious Perl Behavior... by repellent (Priest) on May 25, 2010 at 21:33 UTC
Doesn't de-referencing the arrayref `@$numbers_ref` do an implicit copy (and allocate more space in the process)? I'm not sure. I imagine a counting loop version would be more space-efficient: `sub sum { my ($numbers_ref) = @_; my $sum = 0; foreach my $i (0 .. $#{ $numbers_ref }) { $sum += $numbers_ref->[$i]; } return $sum; }` [download]	[reply] [d/l] [select]
Re^2: Curious Perl Behavior... by ack (Deacon) on May 26, 2010 at 15:09 UTC
Good question...I don't know the answer. The authors apparently didn't think that there is copying going on in the dereferencing since that was essentially the core thesis that they were presenting in that sub-section. I am (or was) assuming that they're pretty tuned in to how Perl's innards work. But then again it wouldn't be the first time that misconceptions even on renouwned authors' expertise were revealed. In fact that is the source of my commentary and curiosity...is something going on that the authors didn't quite get right? I do like you're thinking, though. My only concern would be its generality in the sense that if the array that `$numbers_ref` refers to contains numbers other than a simple 0 to N then `sum()` would not work correctly. Thank you so much for your ideas and comments. I really appreciate them and you have given me some good ideas. ack Albuquerque, NM	[reply] [d/l] [select]
Re: Curious Perl Behavior... by deMize (Monk) on May 28, 2010 at 19:13 UTC
I was curious if anyone could explain the follow-up. Ack, you said you were timing the creation of the references. I thought there was only one reference created, with multiple elements. Why would creating a reference take that much longer than creating and passing the full array? Wouldn't they be almost equal?	[reply]
Re^2: Curious Perl Behavior... by BrowserUk (Patriarch) on May 28, 2010 at 19:59 UTC
The difference is the amount of memory allocated and the way that memory is structured. When passing a list of integers, they are just pushed onto a stack as a series of pointers to small (probably SvIVs), discrete chunks of memory. When passing a reference to an array, that array has to be constructed first. And an array consists of single, contiguous lump of ramp containing pointers to SVs that contain pointers to the SvIVs. For large arrays, allocating that large contiguous chunk of memory will mean a call out to the OS for a new virtual allocation. And that costs a lot relative to (re)allocating small chunks from the existing memory pool. It then has to allocate the memory for the individual elements and write their pointers to the contiguous chunk, much as it does when pushing the list onto a stack. So the extra step is all cost. The savings of passing a reference to an array comes when that array already exists. Then just a pointer is pushed rather than pointers to all the contained elements. Note: My description may not be completely accurate, see PerlGuts Illustrated for the real details. For the difference in numbers:: `sub x{}; cmpthese -1,{ a=>q[ x( 1..1e6 ) ], b=>q[x( [1..1e6] ) ] };; Rate b a b 10.5/s -- -56% a 23.8/s 126% -- @a = 1 .. 1e6;; cmpthese -1,{ a=>q[ x( @a ) ], b=>q[x( \@a ) ] };; Rate a b a 24.1/s -- -100% b 2903218/s 12025922% --` [download] In the first case, the 44% penalty of b is the time taken to allocate that big, contiguous chunk. In the latter, b's huge advantage is pushing 1 reference .v. pushing 1 million references. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]