in reply to Re^3: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
in thread &1 is no faster than %2 when checking for oddness. Oh well.

I do remember seeing that a long time ago, I've not encountered it with recent versions?

The thing causing this behaviour is still there, running empty loops and subtracting times.

Here's an example:

#!/usr/bin/perl use strict; use warnings; use Benchmark; timethese -1, { trivial1 => sub {1}, trivial2 => sub {2}, }; __END__ Benchmark: running trivial1, trivial2 for at least 1 CPU seconds... trivial1: 1 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) @ 44 +79726.55/s (n=5062091) trivial2: 3 wallclock secs ( 1.97 usr + -0.01 sys = 1.96 CPU) @ 43 +171185.71/s (n=84615524)
BTW, it took about 5 _minutes_ to run this benchmark. And for some reason, it decided to run 'trivial2' 16 times as many times as 'trivial1'.
  • Comment on Re^4: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
  • Download Code

Replies are listed 'Best First'.
Re^5: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
by demerphq (Chancellor) on Nov 16, 2006 at 20:24 UTC

    What version of perl are you using?

    ---
    $world=~s/war/peace/g

      That test was run on perl 5.8.5.

        Could you please create a bug report using perlbug and mail it in? I've reported this myself in the past, and it would be nice to have third party confirmation on it. We need to know your perl version and os and the test snippet you posted. For instance I'm on Win32, and if you aren't it would mean that the problem has wider impact than currently beleived.

        Just for information's sake I believe the problem is due to Benchmark trying to remove the timing for the "empty" loop from the results. Elsewhere in this thread you said something like "I dont need something to run two loops and subtract the times for me", but thats not what Benchmark does. It also times an empty loop. It them subtracts the empty loop time from the originals before doing the compare, the idea being to eliminate the overhead of the loop and timing process itself.

        Where this goes is wrong is when the amount of time it takes to benchmark your code is withing the granularity of the timing routines (or the underlying numerical properties of the representation of the time). At that point you end up with the one-tick/two-tick problem (a call starts and ends within the same timestamp effectively having 0 time, or a call starts an ends in adjacent timestamps and thus has a non-zero time, even tho both take the same time to complete. (Nyquist comes to mind)).

        When the thing being benchmarked take a few times longer than the empty loop the difference averages out and results are pretty accurate, for timing fast things, IMO its pretty useless. What would be cool is if Benchmark detected the underlying loop was too fast that it disabled the empty loop subtraction. (Actually I wouldnt care if the empty loop was never removed, as I see it as a "fair penalty" on both.)

        So with your benchmark what is happening is you are timing two empty loops, subtracting one from the other and then seeing the consequence of noise in the calculation. Which as I've explained can easily result in negative times. When combined with negative times it cause Benchmark to go into a degenerate loop, each time increasing the number of iterations it should use to get an good timing, in some cases this can result in an overflow of the for (0..x) {}. (I've seen this in make test left overnight.)

        ---
        $world=~s/war/peace/g

Re^5: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)
by BrowserUk (Patriarch) on Nov 16, 2006 at 20:00 UTC
    And for some reason, it decided to run 'trivial2' 16 times as many times as 'trivial1'.

    The reason is the same as the problem with the OP's original benchmark.

    c:\test>perl -mstrict -we"1;" c:\test>perl -mstrict -we"2;" Useless use of a constant in void context at -e line 1.

    1 is a special cases for Perl constants that does not get optimised away, and is the reason why I used  ... and 1; to avoid the "Useless use in a void context" in my benchmark.

    This special case is so that things like these work:

    while( 1 ) {... ... if 1; 1 while ....;

    However, 2 is not special and so gets optimised away. Hence, trivial1 takes longer than trivial2, so the loop has to be run many, many times more in order to accumulate the "for at least 1 second of cpu" in

    timethese -1, { trivial1 => sub {1}, trivial2 => sub {2}, };; Benchmark: running trivial1, trivial2 for at least 1 CPU seconds ... [Range iterator outside integer range at (eval 57) line 1, <STDIN> lin +e 7.

    I guess my machine is faster than yours. Faster enough that when Benchmark attempted to run the loop for sufficient iterations to accumulate the required cpu usage, it encountered my pet hate of the perl iterator!

    it took about 5 _minutes_ to run this benchmark.

    Unsurprising. When the bodies of the iterator loops is doing next to nothing, or actually nothing, when Benchmark does it initial timings of them in order to calculate the number of iterations to run it for, it attempts to subtract a small amont to account for the overhead of the loop itself, with the result that the calculation are probably being subjected to rounding errors.

    When it takes 84 million iterations of a test to accumulate 1 second of cpu on a modern processor, it certainly indicates that something is wrong with your benchmark.

    This is why I tend to incorprate for loops within the test when benchmarking very small pieces of code, rather than relying on the benchmark iteration count.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Unsurprising. When the bodies of the iterator loops is doing next to nothing, or actually nothing, when Benchmark does it initial timings of them in order to calculate the number of iterations to run it for, it attempts to subtract a small amont to account for the overhead of the loop itself, with the result that the calculation are probably being subjected to rounding errors.

      Bingo. Exactly the reason I don't use Benchmark.

      When it takes 84 million iterations of a test to accumulate 1 second of cpu on a modern processor, it certainly indicates that something is wrong with your benchmark.

      I wanted to show an example where a recent version of Benchmark still produces negative numbers. Knowing that the chance of finding a benchmark producing negative numbers is higher on tests that don't take much time, I picked such a test.

      This is why I tend to incorprate for loops within the test when benchmarking very small pieces of code, rather than relying on the benchmark iteration count.

      This is why I don't bother with Benchmark at all; if I'm going to write my own loops, I don't need a module to subtract two time stamps for me.

        Fair enough. Personally I like the math that Benchmark::cmpthese() does for me.

        More to the point. Even attempting to time operations that take so little time that Benchmark's internal math is subject to rounding errors, is mostly pointless. More so if you only time one (or 10 or 100) occurance(s) of that operation.

        With operations that require 84 million iterations to accumulate 1 second of cpu--that's 0.000000012 seconds per!--you're not timing the operation. You're timing the time it takes to get two successive TOD values from the OS!

        Your numbers will vary widely depending upon whether a task switch occured inside your timing window. So widely that your results will be meaningless.

        The only way to derive any meaning from comparisons of such low cost operations, is to do them 1000s of times and time the entire loop and then divide (hence my expectation of your math). Sure, that means the overhead of the loop is measured also, but if the same (and cheapest) loop mechanism is use for all the tests, then the same overhead will be in all timings.

        Whilst this renders the absolute values (end - start) completely useless for comparison purposes, the relative timings--

        ((end1 - start1)/n1) / ((end2 - start2)/n2)

        is a useful function for comparisons. Not only does this minimise the overhead of the loop, it also minimises the differences between your cpu performance and mine; your OS and mine. Hence the relative performance ratios (percentages) that result are useful, whereas absolute numbers--of wall time; cpu time; or iterations counts in a given period--are completely useless.

        And guess what. These are exactly the figures that Benchmark::cmpthese() produces for you! And this is why I use Benchmark (and advocate the use of cmpthese() over timethese).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.