husker has asked for the wisdom of the Perl Monks concerning the following question:

I'm playing with some code which does a lot of pushing (and some popping). Putting this code inside "timethese", I see that the array (@_, in this case) is NOT cleared out for each iteration of the code. Thus, it grows larger and larger for each iteration. While I can certainly undef @_ at the beginning of the code so that it's empty, this seems like it should be unnecessary (it also adds significantly to the elapsed time of my otherwise fairly speedy code). Shouldn't "timethese" reset all the data structures upon each iteration of the code, so that the last iteration is guaranteed to have the same initial state as the first iteration? If not, why not? Is this behavior what you *expected* when you first discovered it (if indeed you have?) I'm using 5.005 on HP-UX 10.20.

Replies are listed 'Best First'.
Re: timethese
by btrott (Parson) on Jul 08, 2000 at 00:31 UTC
    No, timethese doesn't do any cleanup for you. You're using a special global variable (@_), and that's not going to be cleaned up automatically for you. One option would be to use a different variable, one you could declare lexically with my; that way your array would go out of scope after each iteration of the benchmark, and you'd be starting from scratch each time, as it were.

    If you do use lexicals, make sure that you declare the variable in the code that you're benchmarking; otherwise Benchmark won't be able to "see" your variable, because it won't be in the correct scope.

    You could use something like this:

    timethese(100, { first => sub { my @foo = qw/bar foo/; my $last = pop @foo }, second => sub { my @foo = qw/bar foo/; my $last = $foo[-1] }, });
    Or whatever you're benchmarking.

    Is that the behavior I expected? I think yes, because frankly it wouldn't make much sense for Benchmark to mess about with the @_ you're playing with; it should do as little as possible, because for example, maybe you don't *want* cleanup of your variables. This allows you to have more control over what's happening.

      I guess I disagree. The value of benchmarking is greatly reduced if each iteration of the code we are benchmarking starts off with even a slightly different state than the previous iteration. Even more dangerous is that some people probably aren't even AWARE that this is happening, and thus don't fully understand what's going on behind those benchmark numbers. (I also ran into the case where my benchmarking died because I ran out of memory ... I was running 1,000,000 iterations ... )

      As far as perhaps NOT wanting my variables cleared .. in what realistic benchmarking case would you think that I would want that behavior? I can't think of one myself ... but maybe you have run across that need before?

      To make things even less predictable, let's say I have two functions A and B that I am using in timethese, and that I use @_ in both like I described (pushing and popping). timethese clears @_ for the FIRST time I run A, but no subsequent times, and it clears @_ for the FIRST run of B, but no subsequent times.

      I don't think this is desired behavior ... at least not in my case.

        Consider @_ as being any variable with higher scope than any local my'd variables found within the timethese code.

        Now consider the case that I don't want to time 1,000,000 iterations of the exact same case. Rather, I would like time 1,000,000 iterations of my code on various cases.

        In fact, for real world benchmarking (rather than simply using a benchmark to profile a particular (simple) peice of code), you would probably want to run your timethese using a statistical sample, or even real sample of test cases.

        If timethese could somehow magically (after all how is it going to know which global variables you want to reset? isn't that what my is for anyway?) reset the code to it's initial state (with zero time cost to boot!), then you would not be able to do a simple benchmark such as:

        (warning pseudo code)
        srand; timethese(factorial(int rand*1000), 1 000 000);
        Which would give you a sort of average performance of your code over certain input ranges. However you could not do this benchmark if timethese magically reset your initial state (because then you would pick the same random number a million times).

        Please pardon if I'm less than coherent, I wanted to make sure I posted something today that was semi-useful, however I didn't get to manage it until the wee hours of the morning.

        Ciao,
        Gryn
RE: timethese, and pushing array values
by Adam (Vicar) on Jul 08, 2000 at 00:31 UTC
    I've seen that before, and I fixed it by not using @_. I don't remember but I might have used my() on the variables I was using to keep the scopes completely separate. And no, it was not the behaviour I expected.