Re^3: Benchmark.pm: Does subroutine testing order bias results?

I'll reply here so you get the notification.

The problem is not due to Benchmark.

I speculated that the first-run bias for low numbers of benchmark iterations could be because on the very first iteration when the storage required by the benchmark is first allocated, it is 'virgin' memory.

This is bit like having a empty disk drive; When you write the first few files to it, each one get continuous space that directly follows the last. No freespace chains need to be traversed. There is always enough space at the head of the free-space chain to allocate the next file to be written, because the head of the chain is the rest of the disk.

On the second and subsequent runs, the space freed by the first run is now a chain of blocks of sizes that may need to be coallesced to fulfill any given request. The memory becomes fragmented much like disk drives do.

Tye pointed out {placeholder for the link} that the MS C runtime malloc() imlementation is, um, sub-optimal for the way Perl uses memory. He suggested that I tried building Perl to use PERL_MALLOC, which is tailored to Perl's requirements. (Which AS builds do not use; maybe for good reason.)

I attempted this and discovered that the Makefile will only allow you to use PERL_MALLOC if you disable USE_IMP_SYS which (though not stated in the Makefile hints), also precludes using USE_ITHREADS & USE_MULTIPLICITY.

It turns out that Steve Hays was has persued a similar strategy and has posted a patch at perl-win32-porters that bypasses a problem in the Win32 sbrk() implementation and allows Perl to build with the combination of PERL_MALLOC, USE_IMP_SYS, USE_ITHREADS, USE_MULTIPLICITY.

I also came up with a workaround, but Steve's is better than mine...and he is set up to produce patches properly whereas I am not.

I've tried thrashing the patch applied to 5.8.4, with a test program -- basically running my Benchmark above on 10, 20 and 30 threads simultaneously and (so far) it appears to be stable. THIS IS NOT OFFICIAL. Just my findings on a single, 1-cpu box. It may not be compatible with multi-cpu boxes. It may be that this is not a good test.

Steve also posted a [news://nntp.perl.org/perl.perl5.porters/93142|Smoke report] of the patch that show it failing with 'inconsistant' and 'dubious' result from a threads test in smoke test suite. Whether these are related to the patch or not, is not yet clear to me.

The results I am seeing from running the Benchmark using PERL_MALLOC show not only a marked improvement in the consistency of the first to last runs. Now biased against the first run rather than for it; but only very slightly. It easily gets lost in higher numbers of iterations. It also performs *much* more quickly than the CRT malloc; around 5-7x faster. Not quite back to 5.6.1 performance, but a very definite improvement. The benchmark was constructed to highlight and exaserpate the bias and may be atypical of perl usage, but maybe not for what you are doing.

Whether the PERL_MALLOC/USE_IMP_SYS/USE_ITHREADS combination is truely safe yet is still not clear--but progress has been made.

If you can't or don't want to risk the transistion to building with the patch yet, but need to exclude the vargaries of this first run bias, the simplest expedient is to run the cmpthese/timethese twice in each run. The first time for a single iteration which will get the memory allocated and somewhat fragmented and discard those results. Then run the benchmark again (within the same program) for a largish number of iterations and use those figures.

FYI: PERL_MALLOC in the Makefile shows up as -Dusemymaloc in the Smoke reports and perl -V banners. You may already know this, but it confused me for a while. But then, I'm easily confused.

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Comment on Re^3: Benchmark.pm: Does subroutine testing order bias results? Select or Download Code

Replies are listed 'Best First'.
Re^4: Benchmark.pm: Does subroutine testing order bias results? by jkeenan1 (Deacon) on Jul 18, 2004 at 17:13 UTC
Woah! This is far deeper into the Perl internals than I have ever ventured. When I recently installed 5.8.4 on Darwin, it was the first time I had ever built from source. So I think I'm quite a ways away from exploring these issues. For the purpose of solving the problem I faced when I initiated this thread -- determining if an upgrade to one of my modules improves its performance in toto -- I think I'll KISS and use something like the script I posted in response to simonm above. But could you post some code that illustrates your approach of running `cmpthese()` or `timethese()` twice in a run, the first time for getting memory allocated and the second time for results? Thank you very much.	[reply] [d/l] [select]
Re^5: Benchmark.pm: Does subroutine testing order bias results? by BrowserUk (Patriarch) on Jul 18, 2004 at 17:40 UTC
Sure. As you can see, it's not the lexically first test that gets the biased. It's the first iteration of that test. Which explains why the bias is more pronounced the less runs you do. By running all the tests once and discarding the results, you even up the playing field and the seconds cmpthese shows a much better distribution. You should also consider shutting down as much else that is running on your box for the duration of the tests. For example, if my dial connection times out during a test, a high priority thread runs for the duration of the reconnect. That can completely skew the results. Even using the mouse to pop up the task manager will have some effect. But if this is enough to obscure the gains you have made, it probably means that they are so small as to be subject to random variation anyway. #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $ITERS \|\|= 5; our $REPS \|\|= 10000; sub test { my @strings = map{ ' ' x 1000 } 1 .. $REPS; } my %tests = ( Atest => \&test, Btest => \&test, Ctest => \&test, Dtest + => \&test, ); ## Ignore the results produced by this run cmpthese( 1, \%tests); ## These should show more even distribution. cmpthese( $ITERS, \%tests); P:\test>373536-2 -ITERS=10 Rate Dtest Btest Ctest Atest Dtest 4.27/s -- -0% -7% -67% Btest 4.27/s 0% -- -7% -67% Ctest 4.59/s 7% 7% -- -64% Atest 12.8/s 200% 200% 179% -- Rate Ctest Dtest Atest Btest Ctest 4.10/s -- -1% -1% -1% Dtest 4.13/s 1% -- -0% -0% Atest 4.13/s 1% 0% -- -0% Btest 4.13/s 1% 0% 0% -- [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l]
Re^6: Benchmark.pm: Does subroutine testing order bias results? by jkeenan1 (Deacon) on Jul 19, 2004 at 21:30 UTC
BrowserUk: The results were exactly as you predicted. Here is a set of tests of `cmpthese()` which parallels the results I posted earlier from runs on Win2K and Darwin. I will now try to adapt this approach to my original problem. Thanks for taking the time to look at this. #!/usr/local/bin/perl use strict; use warnings; use Benchmark qw[ timethese cmpthese ]; # Usage: buk.pl iterations records die "Need 2 numeric command-line arguments: $!" unless ( @ARGV == 2 and ($ARGV[0] =~ /^\d+$/ and $ARGV[0] > 0) and ($ARGV[1] =~ /^\d+$/ and $ARGV[1] > 0) ); my ($iterations, $records) = @ARGV; print "\n# . Testing $iterations iterations of $records elements .. +.\n\n"; my %tests = ( Atest => \&test, Btest => \&test, Ctest => \&test, Dtest => \&test, ); cmpthese( 1 , \%tests); # to clear up memory per browseruk cmpthese( $iterations, \%tests); sub test { my @strings = map{ ' ' x 1000 } 1 .. $records; } __END__ # 1. # Testing BrowserUk's second version of script intended # to work around problems in &Benchmark::cmpthese # Perl 5.8.4; Darwin (Mac OS X, version 10.3) # a. Testing 5 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Dtest Ctest Btest Atest Dtest 1.05/s -- -2% -3% -64% Ctest 1.07/s 2% -- -2% -63% Btest 1.08/s 3% 2% -- -63% Atest 2.91/s 177% 173% 168% -- Rate Ctest Btest Atest Dtest Ctest 1.04/s -- -0% -0% -0% Btest 1.04/s 0% -- 0% -0% Atest 1.04/s 0% 0% -- -0% Dtest 1.04/s 0% 0% 0% -- # b. Testing 5 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Ctest Dtest Btest Atest Ctest 5.70 -- -0% -0% -85% Dtest 5.70 0% -- -0% -85% Btest 5.69 0% 0% -- -85% Atest 0.859 564% 564% 562% -- s/iter Ctest Btest Dtest Atest Ctest 5.79 -- -0% -1% -1% Btest 5.76 0% -- -1% -1% Dtest 5.72 1% 1% -- -0% Atest 5.71 1% 1% 0% -- # c. Testing 50 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Dtest Ctest Btest Atest Dtest 1.05/s -- -2% -3% -64% Ctest 1.07/s 2% -- -2% -63% Btest 1.08/s 3% 2% -- -63% Atest 2.91/s 177% 173% 168% -- Rate Ctest Btest Dtest Atest Ctest 1.02/s -- -0% -1% -1% Btest 1.02/s 0% -- -1% -1% Dtest 1.03/s 1% 1% -- -1% Atest 1.04/s 1% 1% 1% -- # d. Testing 50 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Dtest Ctest Btest Atest Dtest 5.70 -- -0% -0% -85% Ctest 5.70 0% -- -0% -85% Btest 5.69 0% 0% -- -85% Atest 0.859 564% 564% 562% -- s/iter Dtest Atest Ctest Btest Dtest 5.75 -- -0% -1% -1% Atest 5.73 0% -- -0% -1% Ctest 5.72 1% 0% -- -0% Btest 5.70 1% 1% 0% -- # e. Testing 100 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Ctest Dtest Btest Atest Ctest 1.05/s -- -0% -3% -62% Dtest 1.05/s 0% -- -3% -62% Btest 1.08/s 3% 3% -- -61% Atest 2.78/s 165% 165% 156% -- Rate Atest Btest Dtest Ctest Atest 1.03/s -- -0% -0% -1% Btest 1.03/s 0% -- -0% -1% Dtest 1.03/s 0% 0% -- -0% Ctest 1.04/s 1% 1% 0% -- # f. Testing 100 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Ctest Dtest Btest Atest Ctest 5.70 -- -0% -0% -85% Dtest 5.70 0% -- -0% -85% Btest 5.70 0% 0% -- -85% Atest 0.875 552% 552% 552% -- s/iter Btest Atest Ctest Dtest Btest 5.75 -- -1% -1% -1% Atest 5.72 1% -- -0% -0% Ctest 5.72 1% 0% -- -0% Dtest 5.70 1% 0% 0% -- # 2. # Testing BrowserUk's second version of script intended # to work around problems in &Benchmark::cmpthese # Perl 5.8.0; Windows2000 Professional # a. Testing 5 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Dtest Btest Ctest Atest 1.20/s -- -27% -28% -28% Dtest 1.64/s 36% -- -2% -2% Btest 1.67/s 38% 2% -- -0% Ctest 1.67/s 38% 2% 0% -- Rate Btest Atest Ctest Dtest Btest 1.65/s -- -0% -1% -1% Atest 1.66/s 0% -- -0% -1% Ctest 1.66/s 1% 0% -- -1% Dtest 1.67/s 1% 1% 1% -- # b. Testing 5 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Ctest Dtest Btest Atest 1.61 -- -25% -25% -26% Ctest 1.20 34% -- 0% -1% Dtest 1.20 34% 0% -- -1% Btest 1.19 35% 1% 1% -- s/iter Ctest Dtest Btest Atest Ctest 1.22 -- -0% -1% -1% Dtest 1.21 0% -- -0% -0% Btest 1.21 1% 0% -- -0% Atest 1.21 1% 0% 0% -- # c. Testing 50 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Btest Dtest Ctest Atest 1.23/s -- -23% -25% -27% Btest 1.61/s 31% -- -2% -5% Dtest 1.64/s 33% 2% -- -3% Ctest 1.69/s 37% 5% 3% -- Rate Atest Btest Dtest Ctest Atest 1.67/s -- -0% -0% -0% Btest 1.67/s 0% -- -0% -0% Dtest 1.67/s 0% 0% -- -0% Ctest 1.67/s 0% 0% 0% -- # d. Testing 50 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Dtest Ctest Btest Atest 1.61 -- -24% -25% -25% Dtest 1.22 32% -- -1% -2% Ctest 1.21 33% 1% -- -1% Btest 1.20 34% 2% 1% -- s/iter Atest Ctest Btest Dtest Atest 1.21 -- -0% -0% -1% Ctest 1.21 0% -- -0% -1% Btest 1.21 0% 0% -- -1% Dtest 1.20 1% 1% 1% -- # e. Testing 100 iterations of 25000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate Atest Ctest Dtest Btest Atest 1.23/s -- -26% -28% -28% Ctest 1.67/s 35% -- -3% -3% Dtest 1.72/s 40% 3% -- -0% Btest 1.72/s 40% 3% 0% -- Rate Dtest Ctest Btest Atest Dtest 1.66/s -- -0% -1% -1% Ctest 1.67/s 0% -- -1% -1% Btest 1.68/s 1% 1% -- -0% Atest 1.68/s 1% 1% 0% -- # f. Testing 100 iterations of 50000 elements ... (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter Atest Ctest Dtest Btest Atest 1.64 -- -26% -26% -27% Ctest 1.22 34% -- -1% -2% Dtest 1.21 36% 1% -- -2% Btest 1.19 38% 3% 2% -- s/iter Btest Ctest Atest Dtest Btest 1.21 -- -0% -0% -0% Ctest 1.21 0% -- -0% -0% Atest 1.21 0% 0% -- -0% Dtest 1.20 0% 0% 0% -- [download] System Info: same as in previous posting	[reply] [d/l] [select]