in reply to Re: timethese
in thread timethese, and pushing array values

I guess I disagree. The value of benchmarking is greatly reduced if each iteration of the code we are benchmarking starts off with even a slightly different state than the previous iteration. Even more dangerous is that some people probably aren't even AWARE that this is happening, and thus don't fully understand what's going on behind those benchmark numbers. (I also ran into the case where my benchmarking died because I ran out of memory ... I was running 1,000,000 iterations ... )

As far as perhaps NOT wanting my variables cleared .. in what realistic benchmarking case would you think that I would want that behavior? I can't think of one myself ... but maybe you have run across that need before?

To make things even less predictable, let's say I have two functions A and B that I am using in timethese, and that I use @_ in both like I described (pushing and popping). timethese clears @_ for the FIRST time I run A, but no subsequent times, and it clears @_ for the FIRST run of B, but no subsequent times.

I don't think this is desired behavior ... at least not in my case.

Replies are listed 'Best First'.
Magic state resets
by gryng (Hermit) on Jul 08, 2000 at 10:34 UTC
    Consider @_ as being any variable with higher scope than any local my'd variables found within the timethese code.

    Now consider the case that I don't want to time 1,000,000 iterations of the exact same case. Rather, I would like time 1,000,000 iterations of my code on various cases.

    In fact, for real world benchmarking (rather than simply using a benchmark to profile a particular (simple) peice of code), you would probably want to run your timethese using a statistical sample, or even real sample of test cases.

    If timethese could somehow magically (after all how is it going to know which global variables you want to reset? isn't that what my is for anyway?) reset the code to it's initial state (with zero time cost to boot!), then you would not be able to do a simple benchmark such as:

    (warning pseudo code)
    srand; timethese(factorial(int rand*1000), 1 000 000);
    Which would give you a sort of average performance of your code over certain input ranges. However you could not do this benchmark if timethese magically reset your initial state (because then you would pick the same random number a million times).

    Please pardon if I'm less than coherent, I wanted to make sure I posted something today that was semi-useful, however I didn't get to manage it until the wee hours of the morning.

    Ciao,
    Gryn
      It is apropriate to have initilization code in your benchmarking routines. Having initilization code will impact the absolute performance of the code by its added overhead. However, when you use Benchmark/timethere you are looking to determine the relative performance of a piece of code. As long as you use the same initilization routine in all pieces of code being benchmarked your results will be valid and useful. If you really want to knoe the absolute performance, you can also run a dummy example along with your real code that contains only the initilization routine and you can factor out the initilization routine's effect when analyzing the results.
        Oh yes,
        I agree completely lhoward. I think what I was trying to say last night was that init code should be done by hand, not by timethese. And as you point out init code can still be factored out of a benchmark, if you still feel obsessive about it.

        Ciao,
        Gryn
      OK I can see your point ... in some benchmarking endeavors, you might not want to initialize variables on each iteration.

      However, to me this seems more like a "side effect" or "artifact" of Benchmark's behavior, not an explicitly designed mechanism to provide the kind of facility you are describing. It's this lack of a formal design that makes it dangerous ... I would bet you that half the people who use Benchmark aren't aware this is what happens and so they can't understand the numbers they are seeing.

      If this behavior *was* part of an intentional design decision, then I would think there would be a way to turn it off. For instance, my code I was working on originally was for use in the thread over at sieve of erothenes. The behavior of @_ in this case really punishes one of the algorithms being used (not mine, actually) because @_ grows and grows with each iteration. I doubt maverick knew that when he wrote his code, but it's his code that gets bitten by this. In any case, I think that either this global variable behavior of Benchmark is either an accident, in which case it should be formalized somehow so everyone understands what is happening, or was *is* formalized, in which case there should be an option to defeat it.

        This is taken from perlman:perlsub:

        When to Still Use local()

        Despite the existence of my(), there are still three places where the local() operator still shines. In fact, in these three places, you must use local instead of my.

        1. You need to give a global variable a temporary value, especially $_.

          The global variables, like @ARGV or the punctuation variables, must be localized with local(). This block reads in /etc/motd, and splits it up into chunks separated by lines of equal signs, which are placed in @Fields.

              {
                  local @ARGV = ("/etc/motd");
                  local $/ = undef;
                  local $_ = <>;  
                  @Fields = split /^\s*=+\s*$/;
              } 
          

          In particular, it's important to localize $_ in any routine that assigns to it. Look out for implicit assignments in while conditionals.

        ---------------------

        Basically, you're asking benchmark to do one of two things. Either, it has to reset all the global variables to the states they were at just before 'timethese' was called. Alternatively, the benchmarked code could be examined by 'timethese' and determine which global symbols your code uses and only reset those.

        I think both of these are silly and horrible, to put it bluntly (sorry :) ). If you want to reset a variable you should do it yourself. And notice that you are warned in the documentation that you should local any global variable that needs a temporary value. Of course, you can also just reset @_ or $_ with:
        @_ = ();
        Which (can) be faster than:
        local @_;
        However, you should note that the local call will keep memory usage down. Otherwise Perl doesn't seem to flush @_ = () memory in the same way.

        Anyway, this feature of not reseting your states is surely not a "side effect". Having Benchmark reset the state of global variables (think about filehandles) as an option or otherwise would definately be an interesting endeavor. However, I do not think it would be worthwhile, nor possible to implement cleanly or quickly.

        Seriously, how hard is it to put your own initialization code in? And also does it not make sense that you would need to reset your conditions yourself?? Lastly, the documentation (perl's) states you should local global variables that need temporary values, and Benchmark's documentation says nothing about reseting variables for you either.

        But cheers anyway, :)
        Gryn