in reply to Re^4: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
in thread &1 is no faster than %2 when checking for oddness. Oh well.

And for some reason, it decided to run 'trivial2' 16 times as many times as 'trivial1'.

The reason is the same as the problem with the OP's original benchmark.

c:\test>perl -mstrict -we"1;" c:\test>perl -mstrict -we"2;" Useless use of a constant in void context at -e line 1.

1 is a special cases for Perl constants that does not get optimised away, and is the reason why I used  ... and 1; to avoid the "Useless use in a void context" in my benchmark.

This special case is so that things like these work:

while( 1 ) {... ... if 1; 1 while ....;

However, 2 is not special and so gets optimised away. Hence, trivial1 takes longer than trivial2, so the loop has to be run many, many times more in order to accumulate the "for at least 1 second of cpu" in

timethese -1, { trivial1 => sub {1}, trivial2 => sub {2}, };; Benchmark: running trivial1, trivial2 for at least 1 CPU seconds ... [Range iterator outside integer range at (eval 57) line 1, <STDIN> lin +e 7.

I guess my machine is faster than yours. Faster enough that when Benchmark attempted to run the loop for sufficient iterations to accumulate the required cpu usage, it encountered my pet hate of the perl iterator!

it took about 5 _minutes_ to run this benchmark.

Unsurprising. When the bodies of the iterator loops is doing next to nothing, or actually nothing, when Benchmark does it initial timings of them in order to calculate the number of iterations to run it for, it attempts to subtract a small amont to account for the overhead of the loop itself, with the result that the calculation are probably being subjected to rounding errors.

When it takes 84 million iterations of a test to accumulate 1 second of cpu on a modern processor, it certainly indicates that something is wrong with your benchmark.

This is why I tend to incorprate for loops within the test when benchmarking very small pieces of code, rather than relying on the benchmark iteration count.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
by Anonymous Monk on Nov 20, 2006 at 09:47 UTC

    Unsurprising. When the bodies of the iterator loops is doing next to nothing, or actually nothing, when Benchmark does it initial timings of them in order to calculate the number of iterations to run it for, it attempts to subtract a small amont to account for the overhead of the loop itself, with the result that the calculation are probably being subjected to rounding errors.

    Bingo. Exactly the reason I don't use Benchmark.

    When it takes 84 million iterations of a test to accumulate 1 second of cpu on a modern processor, it certainly indicates that something is wrong with your benchmark.

    I wanted to show an example where a recent version of Benchmark still produces negative numbers. Knowing that the chance of finding a benchmark producing negative numbers is higher on tests that don't take much time, I picked such a test.

    This is why I tend to incorprate for loops within the test when benchmarking very small pieces of code, rather than relying on the benchmark iteration count.

    This is why I don't bother with Benchmark at all; if I'm going to write my own loops, I don't need a module to subtract two time stamps for me.

      Fair enough. Personally I like the math that Benchmark::cmpthese() does for me.

      More to the point. Even attempting to time operations that take so little time that Benchmark's internal math is subject to rounding errors, is mostly pointless. More so if you only time one (or 10 or 100) occurance(s) of that operation.

      With operations that require 84 million iterations to accumulate 1 second of cpu--that's 0.000000012 seconds per!--you're not timing the operation. You're timing the time it takes to get two successive TOD values from the OS!

      Your numbers will vary widely depending upon whether a task switch occured inside your timing window. So widely that your results will be meaningless.

      The only way to derive any meaning from comparisons of such low cost operations, is to do them 1000s of times and time the entire loop and then divide (hence my expectation of your math). Sure, that means the overhead of the loop is measured also, but if the same (and cheapest) loop mechanism is use for all the tests, then the same overhead will be in all timings.

      Whilst this renders the absolute values (end - start) completely useless for comparison purposes, the relative timings--

      ((end1 - start1)/n1) / ((end2 - start2)/n2)

      is a useful function for comparisons. Not only does this minimise the overhead of the loop, it also minimises the differences between your cpu performance and mine; your OS and mine. Hence the relative performance ratios (percentages) that result are useful, whereas absolute numbers--of wall time; cpu time; or iterations counts in a given period--are completely useless.

      And guess what. These are exactly the figures that Benchmark::cmpthese() produces for you! And this is why I use Benchmark (and advocate the use of cmpthese() over timethese).


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.