sacked has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I was reading through Effective Perl Programming, and at the end of Item 29, the author describes using closures with eval to "generate subroutines that have particular regular expressions 'locked in' with the same flexibility (and efficiency!) as if the expressions were specified at compile time." (pg. 114) Here is the sample subroutine that produces closures, along with extra code I added for testing:
#!/usr/local/bin/perl -w use strict; use re 'debug'; sub make_grep { my $pat = shift; eval 'sub { grep /$pat/o, @_ }'; } my $foo = make_grep( q/sacked/); warn 'assigned coderef to $foo'; my @mtchs = $foo->( qw(sacked soccer) ); warn 'called $foo->()';
I added use re 'debug' so I could see when $pat gets compiled, and I found that it doesn't until $foo->() is called (the output from the warn statements appears in red):

assigned coderef to $foo at anon line 11.
Compiling REx `sacked'
size 4 first at 1
1: EXACT <sacked>(4)
4: END(0)
anchored `sacked' at 0 (checking anchored isall) minlen 6
Guessing start of match, REx `sacked' against `sacked'...
Found anchored substr `sacked' at offset 0...
Guessed: match at offset 0
Guessing start of match, REx `sacked' against `soccer'...
Did not find anchored substr `sacked'...
Match rejected by optimizer
called $foo->() at anon line 13.
Freeing REx: `sacked'


Then I started wondering what the difference was between assigning an eval'ed single-quoted string containing an anonymous sub declaration (as above, that produces a coderef) and assigning a coderef without eval. So I changed make_grep above to this:
sub make_grep { my $pat = shift; sub { grep /$pat/o, @_ }; }
The rest of the code stayed the same, and here is the output, (warnings in red again):

assigned coderef to $foo at anon line 11.
Compiling REx `sacked'
size 4 first at 1
1: EXACT <sacked>(4)
4: END(0)
anchored `sacked' at 0 (checking anchored isall) minlen 6
Guessing start of match, REx `sacked' against `sacked'...
Found anchored substr `sacked' at offset 0...
Guessed: match at offset 0
Guessing start of match, REx `sacked' against `soccer'...
Did not find anchored substr `sacked'...
Match rejected by optimizer
called $foo->() at anon line 13.


What, if any, are the advantages of either method over the other? I have read many times that string eval should usually be avoided. In this case, it seems that both methods produce the same subroutine, where the regex will only be compiled the first time the method is invoked. I am confused as to why the author suggest using eval when it appears that assigning a coderef alone accomplishes the same thing. This topic was touched on in eval with sub, but I did not find a satisfactory answer in that thread.

Also, does anyone know why the eval'ed version triggers the "Freeing REx" output while the other version does not? (This occurs with Perl 5.6.0. Perl 5.7.2 (Devel) does not give the message for either version of the program.)

Thanks!

--sacked

Replies are listed 'Best First'.
Re: coderef assignment with and without eval
by suaveant (Parson) on Jul 24, 2001 at 22:31 UTC
    As of at least perl 5.6.0 you can use qr to precompile regexps... so something like...
    $re = qr/$regex/; $_ =~ $re;
    granted that is not what you asked... but it's much easier and more straightforward if you didn't already know about it.

    Update actually since 5.005 it seems

    Update2 I tried this...

    use Benchmark; sub make_grep { my $p = $_[0]; eval "sub { grep /$p/o, @_ }"; } sub make_grep2 { my $p = $_[0]; sub { grep /$p/o, @_ }; } $re = '(ob|[tTs])'; $a = make_grep($re); $b = make_grep2($re); $pat = qr/$re/; timethese(100000, { a => sub { $a->(qw(this is a test bob and jobe)) }, b => sub { $b->(qw(this is a test bob and jobe)) }, c => sub { grep $pat, qw(this is a test bob and jobe) } });
    ...and got this...
    Benchmark: timing 100000 iterations of a, b, c... a: 7 wallclock secs ( 5.92 usr + 0.00 sys = 5.92 CPU) @ 16 +891.89/s ( n=100000) b: 5 wallclock secs ( 4.57 usr + 0.00 sys = 4.57 CPU) @ 21 +881.84/s ( n=100000) c: 0 wallclock secs ( 0.62 usr + 0.00 sys = 0.62 CPU) @ 16 +1290.32/s (n=100000)
    putting the compiled re to be much faster... maybe I did something wrong...

                    - Ant

      I believe the qr// case is much faster in your benchmark because it does not involve dereferencing a coderef and then invoking a subroutine. In any case, I like the idea of using qr// so I do not need to use string eval. I thought that since qr// precompiles regexes, the subroutine using it would be just as fast as the subroutine with eval. I found that isn't the case:

      #!/usr/bin/perl -w # # compiled regex with qr vs. eval and /o use strict; use Benchmark qw(timethese cmpthese); use vars qw( @words $foo $bar ); @words = qw(heaven evan seven); sub anon_sub_eval { my $pat = shift; eval 'sub { grep /$pat/o, @_ }'; } sub anon_sub_qr { my $p = shift; my $pat = qr/$p/; eval 'sub { grep /$pat/, @_ }'; } $foo = anon_sub_eval( q/evan/); $bar = anon_sub_qr ( q/evan/); my $results = timethese( -10, { anon_sub_eval => sub { $foo->(@words) }, anon_sub_qr => sub { $bar->(@words) }, } ); cmpthese($results); __END__ Benchmark: running anon_sub_eval, anon_sub_qr, each for at least 10 CP +U seconds... anon_sub_eval: 13 wallclock secs (10.73 usr + -0.01 sys = 10.72 CPU) @ + 156307.84/s (n=1675620) anon_sub_qr: 12 wallclock secs (10.40 usr + 0.01 sys = 10.41 CPU) @ 1 +36413.54/s (n=1420065) Rate anon_sub_qr anon_sub_eval anon_sub_qr 136414/s -- -13% anon_sub_eval 156308/s 15% --
      Does anyone know why the qr// version is slower?

      UPDATE: Thanks to a tip from suaveant, I changed the sub with qr// to not use slashes around the compiled regex:
      sub anon_sub_qr { my $p = shift; my $pat = qr/$p/; eval 'sub { grep $pat, @_ }'; } __END__ Benchmark: running anon_sub_eval, anon_sub_qr, each for at least 10 CP +U seconds... anon_sub_eval: 11 wallclock secs (10.66 usr + 0.00 sys = 10.66 CPU) @ + 158363.41/s (n=1688154) anon_sub_qr: 11 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 2 +25783.45/s (n=2345890) Rate anon_sub_eval anon_sub_qr anon_sub_eval 158363/s -- -30% anon_sub_qr 225783/s 43% --
      Lesson learned: do not use slashes around a regex precompiled with qr// in a grep expression. (But the difference is nominal in a regular match expression: $f =~ /$re/ vs. $f =~ $re)

      --sacked
Re: coderef assignment with and without eval
by no_slogan (Deacon) on Jul 24, 2001 at 22:26 UTC
    Try this with both versions of make_grep:
    $a = make_grep("a"); $b = make_grep("b"); print &$a(qw(ack thpt barf)), "\n"; print &$b(qw(ack thpt barf)), "\n";
    With the eval, you get "ackbarf" and "barf", which is correct. Without it, you get "ackbarf" twice. String eval compiles a new sub. Without the eval, the two subs share the same compiled regex. Once the /o causes it to lock down, it is locked for both subs -- even though they see different values of $pat.

    This technique is kind of nasty. Use qr//, like suaveant says. Be sure to compile the regex in the right place.

    sub make_grep { my $pat = shift; my $re = qr/$pat/; return sub { grep /$re/, @_ } }