in reply to Re^4: Benchmark.pm: Does subroutine testing order bias results?
in thread Benchmark.pm: Does subroutine testing order bias results?

Sure. As you can see, it's not the lexically first test that gets the biased. It's the first iteration of that test. Which explains why the bias is more pronounced the less runs you do.

By running all the tests once and discarding the results, you even up the playing field and the seconds cmpthese shows a much better distribution.

You should also consider shutting down as much else that is running on your box for the duration of the tests. For example, if my dial connection times out during a test, a high priority thread runs for the duration of the reconnect. That can completely skew the results.

Even using the mouse to pop up the task manager will have some effect. But if this is enough to obscure the gains you have made, it probably means that they are so small as to be subject to random variation anyway.

#! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $ITERS ||= 5; our $REPS ||= 10000; sub test { my @strings = map{ ' ' x 1000 } 1 .. $REPS; } my %tests = ( Atest => \&test, Btest => \&test, Ctest => \&test, Dtest + => \&test, ); ## Ignore the results produced by this run cmpthese( 1, \%tests); ## These should show more even distribution. cmpthese( $ITERS, \%tests); P:\test>373536-2 -ITERS=10 Rate Dtest Btest Ctest Atest Dtest 4.27/s -- -0% -7% -67% Btest 4.27/s 0% -- -7% -67% Ctest 4.59/s 7% 7% -- -64% Atest 12.8/s 200% 200% 179% -- Rate Ctest Dtest Atest Btest Ctest 4.10/s -- -1% -1% -1% Dtest 4.13/s 1% -- -0% -0% Atest 4.13/s 1% 0% -- -0% Btest 4.13/s 1% 0% 0% --

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^6: Benchmark.pm: Does subroutine testing order bias results?
by jkeenan1 (Deacon) on Jul 19, 2004 at 21:30 UTC
    BrowserUk:

    The results were exactly as you predicted. Here is a set of tests of cmpthese() which parallels the results I posted earlier from runs on Win2K and Darwin. I will now try to adapt this approach to my original problem. Thanks for taking the time to look at this.

    #!/usr/local/bin/perl use strict; use warnings; use Benchmark qw[ timethese cmpthese ]; # Usage: buk.pl iterations records die "Need 2 numeric command-line arguments: $!" unless ( @ARGV == 2 and ($ARGV[0] =~ /^\d+$/ and $ARGV[0] > 0) and ($ARGV[1] =~ /^\d+$/ and $ARGV[1] > 0) ); my ($iterations, $records) = @ARGV; print "\n# . Testing $iterations iterations of $records elements .. +.\n\n"; my %tests = ( Atest => \&test, Btest => \&test, Ctest => \&test, Dtest => \&test, ); cmpthese( 1 , \%tests); # to clear up memory per browseruk cmpthese( $iterations, \%tests); sub test { my @strings = map{ ' ' x 1000 } 1 .. $records; } __END__ # 1. # Testing BrowserUk's second version of script intended # to work around problems in &Benchmark::cmpthese # Perl 5.8.4; Darwin (Mac OS X, version 10.3) # a. Testing 5 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Dtest Ctest Btest Atest Dtest 1.05/s -- -2% -3% -64% Ctest 1.07/s 2% -- -2% -63% Btest 1.08/s 3% 2% -- -63% Atest 2.91/s 177% 173% 168% -- Rate Ctest Btest Atest Dtest Ctest 1.04/s -- -0% -0% -0% Btest 1.04/s 0% -- 0% -0% Atest 1.04/s 0% 0% -- -0% Dtest 1.04/s 0% 0% 0% -- # b. Testing 5 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Ctest Dtest Btest Atest Ctest 5.70 -- -0% -0% -85% Dtest 5.70 0% -- -0% -85% Btest 5.69 0% 0% -- -85% Atest 0.859 564% 564% 562% -- s/iter Ctest Btest Dtest Atest Ctest 5.79 -- -0% -1% -1% Btest 5.76 0% -- -1% -1% Dtest 5.72 1% 1% -- -0% Atest 5.71 1% 1% 0% -- # c. Testing 50 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Dtest Ctest Btest Atest Dtest 1.05/s -- -2% -3% -64% Ctest 1.07/s 2% -- -2% -63% Btest 1.08/s 3% 2% -- -63% Atest 2.91/s 177% 173% 168% -- Rate Ctest Btest Dtest Atest Ctest 1.02/s -- -0% -1% -1% Btest 1.02/s 0% -- -1% -1% Dtest 1.03/s 1% 1% -- -1% Atest 1.04/s 1% 1% 1% -- # d. Testing 50 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Dtest Ctest Btest Atest Dtest 5.70 -- -0% -0% -85% Ctest 5.70 0% -- -0% -85% Btest 5.69 0% 0% -- -85% Atest 0.859 564% 564% 562% -- s/iter Dtest Atest Ctest Btest Dtest 5.75 -- -0% -1% -1% Atest 5.73 0% -- -0% -1% Ctest 5.72 1% 0% -- -0% Btest 5.70 1% 1% 0% -- # e. Testing 100 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Ctest Dtest Btest Atest Ctest 1.05/s -- -0% -3% -62% Dtest 1.05/s 0% -- -3% -62% Btest 1.08/s 3% 3% -- -61% Atest 2.78/s 165% 165% 156% -- Rate Atest Btest Dtest Ctest Atest 1.03/s -- -0% -0% -1% Btest 1.03/s 0% -- -0% -1% Dtest 1.03/s 0% 0% -- -0% Ctest 1.04/s 1% 1% 0% -- # f. Testing 100 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Ctest Dtest Btest Atest Ctest 5.70 -- -0% -0% -85% Dtest 5.70 0% -- -0% -85% Btest 5.70 0% 0% -- -85% Atest 0.875 552% 552% 552% -- s/iter Btest Atest Ctest Dtest Btest 5.75 -- -1% -1% -1% Atest 5.72 1% -- -0% -0% Ctest 5.72 1% 0% -- -0% Dtest 5.70 1% 0% 0% -- # 2. # Testing BrowserUk's second version of script intended # to work around problems in &Benchmark::cmpthese # Perl 5.8.0; Windows2000 Professional # a. Testing 5 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Dtest Btest Ctest Atest 1.20/s -- -27% -28% -28% Dtest 1.64/s 36% -- -2% -2% Btest 1.67/s 38% 2% -- -0% Ctest 1.67/s 38% 2% 0% -- Rate Btest Atest Ctest Dtest Btest 1.65/s -- -0% -1% -1% Atest 1.66/s 0% -- -0% -1% Ctest 1.66/s 1% 0% -- -1% Dtest 1.67/s 1% 1% 1% -- # b. Testing 5 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Ctest Dtest Btest Atest 1.61 -- -25% -25% -26% Ctest 1.20 34% -- 0% -1% Dtest 1.20 34% 0% -- -1% Btest 1.19 35% 1% 1% -- s/iter Ctest Dtest Btest Atest Ctest 1.22 -- -0% -1% -1% Dtest 1.21 0% -- -0% -0% Btest 1.21 1% 0% -- -0% Atest 1.21 1% 0% 0% -- # c. Testing 50 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Btest Dtest Ctest Atest 1.23/s -- -23% -25% -27% Btest 1.61/s 31% -- -2% -5% Dtest 1.64/s 33% 2% -- -3% Ctest 1.69/s 37% 5% 3% -- Rate Atest Btest Dtest Ctest Atest 1.67/s -- -0% -0% -0% Btest 1.67/s 0% -- -0% -0% Dtest 1.67/s 0% 0% -- -0% Ctest 1.67/s 0% 0% 0% -- # d. Testing 50 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Dtest Ctest Btest Atest 1.61 -- -24% -25% -25% Dtest 1.22 32% -- -1% -2% Ctest 1.21 33% 1% -- -1% Btest 1.20 34% 2% 1% -- s/iter Atest Ctest Btest Dtest Atest 1.21 -- -0% -0% -1% Ctest 1.21 0% -- -0% -1% Btest 1.21 0% 0% -- -1% Dtest 1.20 1% 1% 1% -- # e. Testing 100 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Ctest Dtest Btest Atest 1.23/s -- -26% -28% -28% Ctest 1.67/s 35% -- -3% -3% Dtest 1.72/s 40% 3% -- -0% Btest 1.72/s 40% 3% 0% -- Rate Dtest Ctest Btest Atest Dtest 1.66/s -- -0% -1% -1% Ctest 1.67/s 0% -- -1% -1% Btest 1.68/s 1% 1% -- -0% Atest 1.68/s 1% 1% 0% -- # f. Testing 100 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Ctest Dtest Btest Atest 1.64 -- -26% -26% -27% Ctest 1.22 34% -- -1% -2% Dtest 1.21 36% 1% -- -2% Btest 1.19 38% 3% 2% -- s/iter Btest Ctest Atest Dtest Btest 1.21 -- -0% -0% -0% Ctest 1.21 0% -- -0% -0% Atest 1.21 0% 0% -- -0% Dtest 1.20 0% 0% 0% --

    System Info: same as in previous posting