Re: Show that using references is faster
by diotalevi (Canon) on Oct 04, 2004 at 22:09 UTC
|
Your benckmark is flawed. The point to passing references is that you can group multiple things together. The return values and arguments to functions all go on the stack as a list. The less impact you have on this stack, the less pushing/popping work perl will have to do. Less work == faster. In your example you do not vary the number of parameters appropriately. Smaller argument lists are faster
use Benchmark 'cmpthese';
my @list = 1 .. 1_000_000;
$result = 0;
cmpthese( 0, {
list => sub { sub { $result = scalar @_ }->( @list ) },
refr => sub { sub { $result = scalar @_ }->( \ @list ) } } );
# Rate list refr
# list 20.2/s -- -100%
# refr 796087/s 3947995% --
Smaller return lists are faster use Benchmark 'cmpthese';
@list = 1 .. 1_000_000;
@result = @list;
$result = \ @list;
cmpthese( 0, {
list => sub { @result = sub { @list }->() },
refr => sub { $result = sub { \ @list }->() } } );
# Rate list refr
# list 1.67/s -- -100%
# refr 774198/s 46258208% --
| [reply] [d/l] [select] |
Re: Show that using references is faster
by blokhead (Monsignor) on Oct 04, 2004 at 22:03 UTC
|
Your sort is almost certainly the dominant part of the runtime. The sort takes O(n log n) time, while passing an array takes O(n) time, so the sort will overshadow differences in parameter passing as the inputs get larger. I'd use an operation inside the sub that also runs in linear time (like summing the elements).
There's also the fact that I'm pretty sure that @{$_[0]} as an argument to sort ends up making a copy of the array anyway. I'd avoid using the passed array in list context at all within your benchmark. Access the items by their indices to avoid any unwanted copying:
use Benchmark 'cmpthese';
my @data = map { rand } 0 .. 4000;
cmpthese(-2, {
copy => sub { copy(@data) },
ref => sub { ref(\@data) }
});
sub copy { my $sum; $sum += $_[$_] for 0 .. $#_; return $sum
+}
sub ref { my $sum; $sum += $_[0][$_] for 0 .. $#{$_[0]}; return $sum
+}
__END__
Rate copy ref
copy 370/s -- -100%
ref 1884708/s 508700% --
Update: Fixed really dumb error (didn't pass anything to the subs). Code reflects changes (and copy vs. ref is an even bigger difference than before). Thanks bmann.
| [reply] [d/l] [select] |
Re: Show that using references is faster
by chromatic (Archbishop) on Oct 04, 2004 at 22:12 UTC
|
The sort is eating up most of your time here. You want something to show the expense of the stack manipulations in passing the array directly. How about returning a count of the elements?
use strict;
use Benchmark;
my $len = 4000;
my @data;
push(@data,rand()) for (0..$len);
timethese(90000, {
'by_array' => sub { my $count = count_array( @data ); },
'by_ref' => sub { my $count = count_array_ref( \@data ); },
});
sub count_array
{
return scalar @_;
}
sub count_array_ref
{
return scalar @{ $_[0] };
}
| [reply] [d/l] |
Re: Show that using references is faster
by Ven'Tatsu (Deacon) on Oct 04, 2004 at 22:12 UTC
|
My guess is that sort is taking the majority of the time in your code. If your argument list is long enough passing it in as a list can take a decent chunk of time.
use Benchmark qw(cmpthese);
@a = 1..1_000_000;
sub flat { my $array = \@_ }
sub flatcopy { my @copy = @_ }
sub byref { my $array = $_[0] }
sub refcopy { my @copy = @$_[0] }
cmpthese(-1, {
flat => sub { flat(@a) },
flatcopy => sub { flatcopy(@a) },
byref => sub { byref(@a) },
refcopy => sub { refcopy(@a) }
});
__END__
Rate flatcopy flat refcopy byref
flatcopy 3.24/s -- -77% -91% -91%
flat 14.3/s 342% -- -59% -61%
refcopy 35.3/s 987% 146% -- -3%
byref 36.3/s 1020% 153% 3% --
| [reply] [d/l] |
Re: Show that using references is faster
by davido (Cardinal) on Oct 05, 2004 at 04:09 UTC
|
Others have mentioned that the sort is the dominant component to your benchmark, which is true. Now back to the discussion of passing references to, versus the entire flattened list. Here's what's going on behind the scenes:
When you pass a list, each element of that list is sent. The larger the list, the more elements must be passed in. People who put a pencil to it and figure out how the amount of work being done grows as the data-set grows will tell you that passing a list as a parameter is O(n) (big-oh(n)), meaning that as the dataset grows, there is a 1:1 correlation in the amount of computational work that must be done to move the data around.
When you pass a reference to a list, you are passing but one item; the reference. As your dataset, to which the reference refers, grows, there is zero growth in the amount of work being done to pass the reference around. Those pencil-pushing bean-counters will call that O(1) (big-oh(1)). What that says, in terms that the rest of us understand, is that as the dataset grows, there is no growth in the amount of work being done to pass a reference to it.
While these are theoretical concepts, big-oh notation is useful. Unless there is a serious flaw in a O(1) algorithm, it's always going to have a rate of growth of 'no significant growth' in computational time as the dataset grows, whereas O(n) will always have an approximate 1:1 ratio in growth of computational time needed as the dataset grows. That's pretty much all we have to know; proving it through benchmarks will only echo what is already pretty much indisputable. The benchmarks will just serve to quantify what 1:1 turns out to mean in actual time.
| [reply] |
Re: Show that using references is faster
by brian_d_foy (Abbot) on Oct 05, 2004 at 05:56 UTC
|
I ran into the reverse problem: I needed the sort() to dominate the benchmark. As I show in Wasting time thinking about wasted time, you actually have to do a bit of work to make a proper benchmark so you blame the right bits for the slow-downs.
In this case, you should have had a control that measured all the bits except the argument passing so you would know how much of the time is spent on the common code.
When you see results that you don't expect, look for the flaw in the experiment first. :)
--
brian d foy <bdfoy@cpan.org>
| [reply] |
Re: Show that using references is faster
by BrowserUk (Patriarch) on Oct 05, 2004 at 07:20 UTC
|
Whilst Perl's built-in sort now has an optimisation that comes into play if the source array and the target array are the same, this is only usable if the array is declared at the same scope as you wish to sort (without trickery). You cannot (yet?) pass an array ref to sort, which means that you're forced to copy the array in your sub, even when you pass an array ref in.
To emphasis the difference, this benchmark uses two pure perl implementations of qsort algorithm; identical, except that qsortr() sorts the array in-place via a reference, whilst qsortv() passes the array to be sort in and out by value (as a list).
Even for very small arrays, using pass-by-value for this recursive algorithm highlights the cost of indiscriminate copying of lists. With larger numbers it becomes extreme
P:\test>396406 -N=10 -I=-1
Rate by_value xby_ref
by_value 10925/s -- -69%
xby_ref 35072/s 221% --
P:\test>396406 -N=1000 -I=-3
Rate by_value xby_ref
by_value 2.09/s -- -99%
xby_ref 219/s 10340% --
Benchmark:
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
| [reply] [d/l] [select] |
Re: Show that using references is faster
by duff (Parson) on Oct 05, 2004 at 14:35 UTC
|
Can someone show me a test that will prove that passing references to a subroutine is better/faster than passing around the data itself? I haven't hit upon a good test that shows passing references is faster.
While you've been given many excellent answers so far, none of them seem to really address the assumptions in the question :) Passing references isn't better/faster than passing the data around because perl aliases whatever you've passed in to be in @_. It's the act of copying the data that will cause an extra speed/memory hit. So make sure you don't ask perl to make a copy. :-)
| [reply] [d/l] |