Russ has asked for the wisdom of the Perl Monks concerning the following question:

I have decided to post this as a new SOPW question, because it is different (though related to) the question which spawned it.

Thanks to all who posted their results. I had no idea we would see 21 posts (so far). What a helpful and energetic crowd this is... :-)

I was benchmarking some code incorrectly. btrott pointed out that the code being timed was not "seeing" the variables I intended it to use.

He is absolutely right. I don't understand why, though. (Why it does what it does, not why btrott is right) :-)

perldoc Benchmark says of timethis (which timethese calls),

Simple test scripts to simulate Benchmark.pm's behavior work as I expect. In other words, the eval'ed code can see the package variables.

In Benchmark.pm, the runloop subroutine does basically this:

my $subcode = sub { for (1 .. $n) { local \$_; package $pack; $c;} } my $subref = eval $subcode; &$subref
where $n is the number of iterations, $pack is the caller's package and $c is the code to execute.

So, simulating Benchmark could look roughly like this:

package R; # Just to demonstrate a variable in another package my $R = 10; package main; # Just to prove we are "executing" in a different packag +e my $subcode = "sub { local \$_; package R; print $R}"; my $subref = eval $subcode; &$subref;
Executing this code will print 10. Reversing the packages (so $R is in main, and we execute the code in package R, but tell it to run in package main) also prints 10. Using only package main, it will still print 10. In other words, I have not found a way to make this code behave like Benchmark.

So, I don't yet understand why my original code doesn't access $now and %url. It clearly doesn't, but I don't know why.

Any takers? :-)

Russ

P.S. I have to assume I will be embarrassed at what I am missing, here, but please fire away.

P.P.S. Here are my results with the correct code. Grep is still faster than the other algorithms, so the original question still stands (how can Perl execute an O(N) plus an O(N log N) algorithm faster than a single O(N)?)

Benchmark: timing 1000 iterations of Grep, Max, Ternary... Grep: 20 wallclock secs (19.99 usr + 0.03 sys = 20.02 CPU) Max: 22 wallclock secs (21.69 usr + 0.03 sys = 21.72 CPU) Ternary: 22 wallclock secs (22.58 usr + 0.00 sys = 22.58 CPU)

P.P.P.S. Many kudos to aighearach, lhoward, btrott and many others for their good work helping me explore this issue. You have worked wonders toward satisfying my (insatiable?) curiosity.
Thank you

Replies are listed 'Best First'.
Re: Benchmark.pm's scoping behavior
by chromatic (Archbishop) on Jun 21, 2000 at 07:42 UTC
    Here's another example just to show the difference between scopes:
    #!/usr/bin/perl -w use strict; package R; my $R = 10; use vars qw( $var ); $var = 20; package main; print "R is >>$R<<\n"; package R; print "R is >>$R<<\n"; print "var is >>$var<<\n";
    Nifty, everything prints. All in the same file, everything okay, right? Let's mess with scoping a little bit:
    #!/usr/bin/perl -w use strict; { package R; my $R = 10; use vars qw( $var ); $var = 20; } package main; print "R is >>$R<<\n"; print "var is >>$R::var<<\n"; package R; print "R is >>$R<<\n"; print "var is >>$var<<\n";
    Whoops, errors. $R is unavailable in both main:: and R::, while $var is unavailable in main::. Here's how to fix that:
    #!/usr/bin/perl -w use strict; { package R; my $R = 10; use vars qw( $var ); $var = 20; } package main; # print "R is >>$R<<\n"; print "var is >>$R::var<<\n"; package R; # print "R is >>$R<<\n"; print "var is >>$var<<\n";
    There's no way to get to $R, so we'll comment it out. Lexical (or my) variables have block scope, if they're in a block, or file-scope, if they're not. That's why the first example worked, in the main package. (Yes, I put those braces in there just for this point.)

    Variables declared with 'use vars' have package scope, so we can switch to package R again and print $var without prepending the package name, as we have to do when we're in package main. Presumably the new our works much the same way in 5.6.

    I can't let the idea of lexical variables and scope go away without making at least one comment about the misunderstood and feared closure. Take a look at this:

    #!/usr/bin/perl -w use strict; { package R; my $R = 10; use vars qw( $var ); $var = 20; sub show_R { $R }; } package main; print "var is >>$R::var<<\n"; print "var is ", R::show_R(), "\n"; package R; print "var is >>$var<<\n"; print "var is ", show_R(), "\n";
    Nifty, hmm? Because $R is available in the same scope where show_R() is defined, the subroutine has access to it, even when called outside of that scope. That's all a closure is, but that's not all it's good for.
      See above in my reply to btrott. You have provided a great discussion, and have certainly taught me something.

      Closures are cool. I've never had any trouble understanding lexical variables within some other block (like a sub). I guess I have never tried to declare a closure at file scope, so I've haven't learned to fear them. It wouldn't have worked for me (at file scope), before today, so I guess the timing is better this way. :-)

      Thanks for sharing your knowledge...

      Russ

Re: Benchmark.pm's scoping behavior
by btrott (Parson) on Jun 21, 2000 at 07:06 UTC
    You're confusing package globals and lexical variables. You wrote this code to demonstrate that you can see package variables in package R:
    package R; my $R = 10; package main; my $subcode = "sub { local \$_; package R; print $R}"; my $subref = eval $subcode; &$subref;
    That's not doing what you think it's doing. That code is accessing $R not because you've specified "package R" but because $R is lexically-scoped to the containing file. If you had been running your quasi-benchmark code in a different file, it wouldn't have worked. That's why, in your original code, the benchmarked code couldn't access $now and %url when you declared them with my: because they were declared lexical to the enclosing file, and your code was running in a different file.

    That's why you need to make your variables package globals, presumably using the vars pragma, but probably you could also use our, in Perl 5.6. Because those package globals are in your package's namespace, so when Benchmark executes your code in the "caller's package", it's accessing the correct variables.

    As for your other question:

    > P.P.S. Here are my results with the correct code. Grep is > still faster than the other algorithms, so the original > question still stands (how can Perl execute an O(N) plus > an O(N log N) algorithm faster than a single O(N)?)
    I don't know, but I think lhoward had the best suggestion: the sort may be O(N log N), but that's not the only measure of efficiency; how you code the algorithm, and how it's optimized, is also a big influence on speed. So running an O(N log N) algorithm in tight C code could still be faster than running an O(N) algorithm written in Perl.
      I see.

      You're right, I had been confused about lexical scope. I assumed a lexical was scoped to its package, but it is actually scoped to its enclosing block (in this case, the file).

      Between your explanation and chromatic's (below), this is a great discussion!

      So, I went to my Perl books to try to find this explanation (to see which sections I have apparently never read). Camel Page 107-108:

      ...lexically scoped declarations have an effect only from the point of the declaration to the end of the innermost enclosing block. ... But a package declaration merely declares the identity of the default package for the rest of the enclosing block.
      Well, there it is. Packages do not define a namespace for lexical variables, because they do not define an enclosing block.

      I believed, in my code, that lexical variables were "scoped" to their package. (I almost *never* use global variables... package globals or otherwise. Without my habit of mying every variable, I would not have stumbled across this (and lost an opportunity to learn even more)) :-)

      All of a sudden, my version of reality is being updated. ;-)

      I agree about my other question. grep and sort are obviously highly optimized. The coolest thing is, I haven't been able to find a crossover point (where sort grep become slower than max). No matter how large the data set, and no matter how many elements must be sorted, it is still faster to do the sort algorithm than max. It should be amazing, but this is Perl, where miracles are commonplace... :-)

      Russ

Re: Benchmark.pm's scoping behavior
by btrott (Parson) on Jun 21, 2000 at 07:25 UTC
    Oh... everything I said before holds, but I thought of a little example I could use to prove that your code isn't doing what you think it is. Pull out the R package stuff into a separate file and call it R.pm:
    package R; my $R = 10;
    Now use that in your other file:
    use R; package main; my $subcode = "sub { local \$_; package R; print $R }"; my $subref = eval $subcode; &$subref;
    Run it, and what do we get? Nothing. Well, if you have warnings on, you get some warnings about unitialized values, and names only being used once.

    The point, though, is that one we've moved the declaration of $R as a lexical variable into a separate file, your code no longer does what you thought it would. Which means that it wasn't doing what you thought it was, in the first place. :)

RE: Benchmark.pm's scoping behavior
by Aighearach (Initiate) on Jun 21, 2000 at 08:28 UTC
    Updated: Yes, you are right, it was messed up. My bad.
      No, that's not quite right.

      In the first example, change increment() to return $data instead of undef, and then you can call error() without getting an uninitialized value error.

      In the second example, simply give $data a default value, call error() before your while block, and you'll see that it prints.

      Here's the code I used for an example:

      #!/usr/bin/perl -w use strict; my $data; my $ERROR = 0; # error(); # uncomment this to print $data before the loop while ( (defined $data++) && everything_is_okay() ) { print "looping for the $data time.\n"; if ( $data > 5 ) { $ERROR++; error(); # print $data if there's an error } } sub everything_is_okay { if ( $ERROR ) { print STDERR "ERROR LEVEL $ERROR\n"; exit $ERROR; } return 1; } sub error { print $data; }
      I put the defined check in there so the loop will get past the first iteration. $data++ (when $data is not initialized) evaluates to zero:
      my $data; print $data++;
      Overall, nice discussion. To comment on one point you raised, though... you wrote:
      sub error { print $data; # oops, doesn't work! }
      Well, it *does* work. If you try it, you'll see that it works just fine, because it's a lexical variable... lexical to the scope of the file. And the error sub is defined in the file, so $data is visible there.

      That's not to say that it's recommended, of course, because it can definitely cause you problems. Particularly when using mod_perl. You'll get the dreaded "variable may not stay shared" message, because essentially you've defined a closure.

      That said, though, good points.

        As it turns out, almost all my code has to run under mod_perl...