Re: Benchmark.pm: Does subroutine testing order bias results?

Does the order in which Benchmark.pm tests various subroutines bias the results which Benchmark reports?

I think that you and others have done a good job of documenting that it can.

As a suggestion, it might be feasible to patch Benchmark.pm to get around this issue. Any change you make will produce a different set of results, and it's hard to say which one is "correct", but in practical terms one or another of these modes might help in investigating a specific timing issue developers sometimes encounter:

You could interleave the subroutine calls in some randomized order, rather than doing each one sequentially.
Alternately, you could fork and have each child only time one of the subroutines and then pass the data back to the parent for integration.
In cases in which increasing the process's memory allocation is an issue, it might help to start by performing a few extra iterations of each of the provided subroutines, and throwing the results out, before starting the real timing runs.

Update: I implemented the second of these ideas as Exporter::Forking.

Comment on Re: Benchmark.pm: Does subroutine testing order bias results?

Replies are listed 'Best First'.
Re^2: Benchmark.pm: Does subroutine testing order bias results? by hossman (Prior) on Jul 15, 2004 at 06:20 UTC
I don't know that modifing Benchmark's cmpthese/timethese current behavior is neccessary, but a new method that supports interleaving might be usefull... package Benchmark; use List::Util qw(shuffle); sub interleavethese{ # based on timethese, but it merges the results from several small # iterations with the order shuffled each time. my($n, $iters, $alt, $style) = @_; die "usage: interleavethese(count, iters, { 'Name1'=>'code1', ... +}\n" unless ref $alt eq 'HASH'; my @names = sort keys %$alt; $style = "" unless defined $style; print "Benchmark: " unless $style eq 'none'; if ( $n > 0 ) { croak "non-integer loopcount $n, stopped" if int($n)<$n; print "timing $iters sets of $n iterations of" unless $style e +q 'none'; } else { print "running" unless $style eq 'none'; } print " ", join(', ',@names) unless $style eq 'none'; unless ( $n > 0 ) { my $for = n_to_for( $n ); print ", each for $iters iterations of at least $for CPU secon +ds" unless $style eq 'none'; } print "...\n" unless $style eq 'none'; my %results; for (my $i = 0; $i < $iters; $i++) { my @tasks = shuffle @names; foreach my $name (@tasks) { my $t = timethis ($n, $alt -> {$name}, $name, $style); $results{$name} = exists $results{$name} ? timesum($results{$name}, $t) : $t; } } return \%results; } package main; #use it like this... use Benchmark qw[ cmpthese interleasethese ]; cmpthese(interleavethese(5, 3, { Atest => \&test, Btest => \&test, Ctest => \&test, })); cmpthese(interleavethese(-5, 3, { Atest => \&test, Btest => \&test, Ctest => \&test, })); [download]	[reply] [d/l]
Re^2: Benchmark.pm: Does subroutine testing order bias results? by jkeenan1 (Deacon) on Jul 15, 2004 at 22:43 UTC
simonm: As a suggestion, it might be feasible to patch Benchmark.pm to get around this issue. ... You could fork and have each child only time one of the subroutines and then pass the data back to the parent for integration. Here is a hack which implement's simonm's suggestion, which was also made independently to me by Gary Benson of Perl Seminar New York over a fine Indian meal at Angon on East 6 Street in Manhattan. The hack involves three separate files and is probably modularizable, at least in part. To add/modify subroutines to be tested, add them to the third file below. Read more... (4 kB)	[reply] [d/l]