Re^6: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark)

Replies are listed 'Best First'.
Re^7: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark) by demerphq (Chancellor) on Nov 20, 2006 at 12:36 UTC
Could you please create a bug report using perlbug and mail it in? I've reported this myself in the past, and it would be nice to have third party confirmation on it. We need to know your perl version and os and the test snippet you posted. For instance I'm on Win32, and if you aren't it would mean that the problem has wider impact than currently beleived. Just for information's sake I believe the problem is due to Benchmark trying to remove the timing for the "empty" loop from the results. Elsewhere in this thread you said something like "I dont need something to run two loops and subtract the times for me", but thats not what Benchmark does. It also times an empty loop. It them subtracts the empty loop time from the originals before doing the compare, the idea being to eliminate the overhead of the loop and timing process itself. Where this goes is wrong is when the amount of time it takes to benchmark your code is withing the granularity of the timing routines (or the underlying numerical properties of the representation of the time). At that point you end up with the one-tick/two-tick problem (a call starts and ends within the same timestamp effectively having 0 time, or a call starts an ends in adjacent timestamps and thus has a non-zero time, even tho both take the same time to complete. (Nyquist comes to mind)). When the thing being benchmarked take a few times longer than the empty loop the difference averages out and results are pretty accurate, for timing fast things, IMO its pretty useless. What would be cool is if Benchmark detected the underlying loop was too fast that it disabled the empty loop subtraction. (Actually I wouldnt care if the empty loop was never removed, as I see it as a "fair penalty" on both.) So with your benchmark what is happening is you are timing two empty loops, subtracting one from the other and then seeing the consequence of noise in the calculation. Which as I've explained can easily result in negative times. When combined with negative times it cause Benchmark to go into a degenerate loop, each time increasing the number of iterations it should use to get an good timing, in some cases this can result in an overflow of the for (0..x) {}. (I've seen this in make test left overnight.) --- $world=~s/war/peace/g	[reply]
Re^8: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark) by Anonymous Monk on Nov 20, 2006 at 13:57 UTC
I have submitted bug reports against Benchmark, and I even have send in patches, although not for this one particular bug. I'm done with Benchmark - my advice is to not use it, and I'm no longer spending effort in trying to get if fixed. Fix one issue, and another pops up. The module tries to do too many clever things, and therefore not succeeding in any. It has a lot of bells and whistles for something relatively simple. It's overly engineered. Just for information's sake I believe the problem is due to Benchmark trying to remove the timing for the "empty" loop from the results. Oh, I know. I'm quite familiar with the code. Elsewhere in this thread you said something like "I dont need something to run two loops and subtract the times for me", but thats not what Benchmark does. It also times an empty loop. Don't quote me out of context. I was replying to BrowserUKs technique of putting the loop inside the code to benchmark. Once you put the (or a) loop into the code you benchmark, anything Benchmark.pm tries to do compensate for running an empty loop is fruitless. So with your benchmark what is happening is you are timing two empty loops, subtracting one from the other and then seeing the consequence of noise in the calculation. As I said elsewhere, I deliberately picked a benchmark with a tiny loop to quickly get an example with negative times. It does happen with other code as well, although far less common. And I wasn't going to spend a day constructing one. All I wanted to do was to show that the problem wasn't an issue of the past (which was the claim being made). Note also that the benchmarks that didn't use Benchmark that I showed were pointless. I was using gettimeofday() to get a timestamp. I should of course have used times(). (Which is what Benchmark uses as well). Here's the corrected version: #!/usr/bin/perl use strict; use warnings; my $ITERATIONS = 10_000_000; my $RUNS = 10; my $counter1 = 0; my $counter2 = 0; my $counter3 = 0; my $counter4 = 0; foreach (1 .. $RUNS) { my ($u1, $s1) = times; for (1 .. $ITERATIONS) {++$counter1 & 1 and 1} my ($u2, $s2) = times; for (1 .. $ITERATIONS) {++$counter2 % 2 and 1} my ($u3, $s3) = times; for (1 .. $ITERATIONS) {$a = ++$counter3 & 1} my ($u4, $s4) = times; for (1 .. $ITERATIONS) {$a = ++$counter4 % 2} my ($u5, $s5) = times; my $d1 = $u2 + $s2 - $u1 - $s1; my $d2 = $u3 + $s3 - $u2 - $s2; my $d3 = $u4 + $s4 - $u3 - $s3; my $d4 = $u5 + $s5 - $u4 - $s4; printf "And: %.2f Mod: %.2f; And: %.2f Mod: %.2f\n", $d1, $d2, $ +d3, $d4; } __END__ And: 2.89 Mod: 3.25; And: 2.82 Mod: 3.05 And: 2.74 Mod: 3.21; And: 2.76 Mod: 3.05 And: 2.69 Mod: 3.16; And: 2.91 Mod: 3.04 And: 2.67 Mod: 3.15; And: 2.79 Mod: 3.21 And: 2.71 Mod: 3.15; And: 2.75 Mod: 3.04 And: 2.80 Mod: 3.16; And: 2.75 Mod: 3.04 And: 2.69 Mod: 3.16; And: 2.93 Mod: 3.08 And: 2.67 Mod: 3.15; And: 2.75 Mod: 3.19 And: 2.69 Mod: 3.17; And: 2.75 Mod: 3.03 And: 2.80 Mod: 3.18; And: 2.76 Mod: 3.05 [download] Now, do I care whether the it also times the overhead of the loop? No. Either the overhead of the loop is significant, or not. If it's significant, it doesn't matter (for a performance point of view) which solution I pick. Even if I pick the slower one, the difference will only be noticable in a so-called "tight" loop, but then the overhead of the loop itself becomes significant. And if the loop overhead isn't significant, well, then it doesn't really matter that I add the overhead to the results, does it? Now, if I really want to be fancy (and when I do need to benchmark something more seriously than something trivial on perlmonks), I run the benchmark 100 or 1000 times, keeping track of the results, discarding the lowest and highest 5% of the results, and averaging the rest (calculation standard deviation as well). And I do it with different datasets. All things Benchmark doesn't support anyway.	[reply] [d/l]
Re^9: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark) by demerphq (Chancellor) on Nov 20, 2006 at 14:29 UTC
Don't quote me out of context. I was replying to BrowserUKs technique of putting the loop inside the code to benchmark. Once you put the (or a) loop into the code you benchmark, anything Benchmark.pm tries to do compensate for running an empty loop is fruitless. Yes I agree. And i quoted you out of context because I wanted to make a point about what Benchmark does wrong, not set you up or make you look dumb or anything. Sorry if it seemed that way. Now, do I care whether the it also times the overhead of the loop? No. Either the overhead of the loop is significant, or not. If it's significant, it doesn't matter (for a performance point of view) which solution I pick. I agree entirely. Now, if I really want to be fancy (and when I do need to benchmark something more seriously than something trivial on perlmonks), I run the benchmark 100 or 1000 times, keeping track of the results, ..... And I do it with different datasets. All things Benchmark doesn't support anyway. Ive often wanted, and a few times tinkered with a more flexible and less "clever" Benchmark framework. --- $world=~s/war/peace/g	[reply]
Re^8: &1 is no faster than %2 when checking for oddness. (Careful what you benchmark) by BrowserUk (Patriarch) on Nov 20, 2006 at 13:43 UTC
Whomever takes a look at this should also consider the effects of task switching: `cmpthese 1e6, { a=>q[ $_ = sleep 0 ], b=>q[ $_ = time() ] };; (warning: too few iterations for a reliable count) Rate a b a 533333/s -- -90% b 5347594/s 903% --` [download] Of course this is not a great test, but it does server to maximise the effect of relinguished timeslices on the overall timings. I've look at Benchmark a few times to see if I could see how to improve it, but it's not easy to find a generic solution. I've found that it is easier to work around it's limitations than fix them. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]