in reply to Inconsistent Results with Benchmark

n the only change I make is to switch the names of a_1 and a_2 so that they run in the opposite order.

One possibility: if the benchmarked subs/tests cause a fair amount of memory to be allocated; then when the first sub/test runs, it pays the penalty not only of perl allocating that memory from the heap; but also of perl requesting that memory from the OS. However, when the second subroutine/test runs, the memory used by the first sub has been returned to the heap, but not to the OS, so the second sub/test runs more quickly because no (further) requests to the OS for memory are required.

Mitigation: Add another subroutine, named to be lexically earlier than the others, that simply allocates a large(r) amount of memory, in small chunks. Eg.

aaaaaaaaaa => q[ my @a; $a[ $_ ] = [ 1 .. 10 ] for 1 .. 1e6; ],

If you choose the constants in that correctly, this forces the heap to be expanded, in the right way, such that neither of your real tests will require perl to request more memory from the OS; and thus the benchmarking is more accurate.

Note: That is just one of the possible causes, there are several others. If you posted particular examples of the code being tested, you might get more relevant possibilities and mitigations.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Inconsistent Results with Benchmark
by benwills (Sexton) on Dec 08, 2014 at 06:40 UTC

    That makes sense. I was loading about 500kb of text into a variable, then running the regular expressions on that.

    I just tested it (without your suggestion) with a file of about 2mb and saw a similar trend. Then tested with a smaller file and saw less of the trend.

    Then I used your idea, with some slight changes, sub a_(){ my @a; $a $_ = 0 for 1 .. (4 * 1024); }, and tested with a few different file sizes. I'm not entirely sure if the changes I made would make that sub not function as you intended. I'm a mediocre programmer and new to Perl, so I don't fully understand how q and square brackets work in your code, even after just looking up some documentation. (I'm sure it'll sink in in a couple of days).

    After I added that sub, I began consistently getting the same results with a 5 second timer as I do with a 60 second timer.

    Thanks for that. It was a bit discouraging earlier to find out that a few days worth of testing was mostly nullified. But understanding a little more about what's going on and figuring out how to compensate for it definitely helps.

    If you think it would be valuable for any reason to put my code up here, I can clean it up and get it up here. Otherwise, I think I'm good.

    (minor edits for clarification)
      I don't fully understand how q and square brackets work in your code, even after just looking up some documentation.

      Benchmark will accept a string containing a piece of code, where you normally supply a subroutine. From the synopsis:

      # Use Perl code in strings... timethese($count, { 'Name1' => '...code1...', 'Name2' => '...code2...', }); # ... or use subroutine references. timethese($count, { 'Name1' => sub { ...code1... }, 'Name2' => sub { ...code2... }, }); # cmpthese can be used both ways as well cmpthese($count, { 'Name1' => '...code1...', 'Name2' => '...code2...', }); cmpthese($count, { 'Name1' => sub { ...code1... }, 'Name2' => sub { ...code2... }, });

      That's what my example did.

      What actually happens under the covers (greatly simplified) is that a call to the code reference (subroutine) that you supply to Benchmark is eval'd into another subroutine within the package that wraps that call in a loop:

      my ($subcode, $subref); if (ref $c eq 'CODE') { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; &\$c; +} }"; $subref = eval $subcode; } else { $subcode = "sub { for (1 .. $n) { local \$_; package $pack; $c;} } +"; $subref = _doeval($subcode); }

      As you can see, if what you supply is a string rather than a code ref, that string is eval'd into that extra level of subroutine instead.

      From the Benchmark docs:

      CAVEATS

      Comparing eval'd strings with code references will give you inaccurate results: a code reference will show a slightly slower execution time than the equivalent eval'd string.

      So either use code refs, or strings, but do not mix the two. (Though in the case of our dummy sub that just forces preallocation of memory, it doesn't matter as it isn't a part of the timing.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Ah, that makes sense now. I knew several of those pieces, but didn't put them together like you just did.

        I saw that q returns a string, but was confused as to what Benchmark would do with that. But I also knew that Benchmark eval'd code in a loop. I just didn't put it all together.

        Thanks for taking the time to explain that.

      Wouldn't just running each sub once before starting the benchmark do just as well?
      --
      A math joke: r = | |csc(θ)|+|sec(θ)| |-| |csc(θ)|-|sec(θ)| |

        ysth,

        ++ for the good suggestion. And you can add a print statement for the *returned* results from each subroutine.

        I've seen examples of 'benchmark'ing 2 subroutines that don't produce the same results. Unless the results are the same, it doesn't make sense to compare the subroutines.

        Regards...Ed

        "Well done is better than well said." - Benjamin Franklin

        No idea why I didn't think of that, but, based ons some tests, it looks like that works. For good measure, I'm loading it twice. And it keeps me from having to guess at how large of a string to create.

        Thank you.