in reply to Multiple uses of (?{ code }) do not appear to be called

Personally, I'm really bad at closures, but it seems clear the code block compiled in your regex is only compiled once and it's pushing onto the @o declared in the first call to foo(). It would seem closures pull a copy of their local scope with them when they stick around.

The obvious choices for articles here are "Why Closures?" (merlyn links to an article of his) and "Trying to understand closures."

I first ran into problems with closures in the mod_perl docs. Although, I just went through all my posts though, because I swear I asked a question similar to yours here like three years ago...

UPDATE: It isn't exactly clear to my why this code is cached (or whatever) though. Why does that code block (as rhesa puts it) "get closed over?"

-Paul

Replies are listed 'Best First'.
Re^2: Multiple uses of (?{ code }) do not appear to be called
by rhesa (Vicar) on Dec 29, 2006 at 12:56 UTC
    Yep. Adding a print pos() inside the (?{code}) block shows that it does give the right pos values:

    Moving the declaration of @o outside of sub foo makes it clear:

    In other words, the @o in the (?{code}) block has been closed over, while the one in the print statement is a fresh array. The closed array is no longer accessible outside the regexp.

    A naughty solution might be the following (abusing something of a bug in perl; this may break in future versions):

    sub foo { my $window = "a b X20 c X5 d e X17 X12"; my @o = () if pos; # conditional declaration makes @o "static" @o = (); # reset it always my @m = ($window =~ m/(X\d+(?{push @o, pos()}))/g); print "Matches: @m"; print "Offsets: @o"; print " "; } __END__ Matches: X20 X5 X17 X12 Offsets: 7 12 20 24 Matches: X20 X5 X17 X12 Offsets: 7 12 20 24 Matches: X20 X5 X17 X12 Offsets: 7 12 20 24
      Ha ha, looks like you and I were on the same path. I think I'll avoid using the bug though it is useful to know.
        Glad you're going to avoid that "feature" :-) (see e.g. Re: static-like persistence of my variable due to trailing conditional and How does my work with a trailing conditional for previous discussions)

        I'd like to give you two reasonable workarounds. The first is simply using the global variable, with my added suggestion to enclose it in an anonymous block:

        { # limit scope my @o; sub foo { my $window = "a b X20 c X5 d e X17 X12"; @o = (); my @m = ( $window =~ m/(X\d+(?{push @o, pos()}))/g ); print "Matches: @m"; print "Offsets: @o"; print " "; } }
        This will make sure that only foo() can see @o.

        The second workaround is basically a rewrite of your code. It doesn't solve the general issue, but it avoids the use of the complicated (?{BLOCK}) feature for your particular case:

        sub foo { my $window = "a b X20 c X5 d e X17 X12"; my( @o, @m ); while( $window =~ m/(X\d+)/g ) { push @m, $1; push @o, $+[0]; } print "Matches: @m"; print "Offsets: @o"; print " "; }
Re^2: Multiple uses of (?{ code }) do not appear to be called
by bsdz (Friar) on Dec 29, 2006 at 12:56 UTC
    I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem.
    my @o; foo(); foo(); foo(); sub foo { my $window = "a b X20 c X5 d e X17 X12"; @::o = (); my @m = ($window =~ m/(X\d+(?{push @::o, pos()}))/g); print join(" ", "Matches:", @m, "\n"); print join(" ", "Offsets:", @::o, "\n\n"); }
    Could that be explained by closures too? I am still examining the re 'debug' output.
      I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem.

      Then perhaps instead of @::o = () you may want to use our in conjunction with local:

      #!/usr/bin/perl use strict; use warnings; sub foo { my $window = "a b X20 c X5 d e X17 X12"; local our @o; my @m = $window =~ m/(X\d+(?{push @o, pos}))/g; print "Matches: @m,\n"; print "Offsets: @o,\n\n"; } foo; foo; foo; __END__
        I'll be damned. It feels a little contradictory but it works. Maybe it's time I re-read all that Perl literature again!
        Heh, nifty. But why not simply say our @o = ();? Surely the scope is still limited to the sub. I'm a bit puzzled by how local interacts with our here.

      Could that be explained by closures too? I am still examining the re 'debug' output.

      The debug output clearly shows you were correct in your first post — that the matching does indeed function. The problem (that is definitely cured by using a global @o) is that your regex code was pushing onto the wrong @o, one nolonger in any scope accessable by non-perl-deities.

      -Paul

Re^2: Multiple uses of (?{ code }) do not appear to be called
by demerphq (Chancellor) on Dec 30, 2006 at 14:58 UTC

    Why does that code block (as rhesa puts it) "get closed over?"

    Because the current code implementing both (?{}) and (??{}) is a hack. The code is compiled once for performance reasons. Unfortunately its not simple to make it not a closure without a performance penalty. The issue you have think about is that the code could execute in almost any context due to qr//, thus the variable binding needs to occur at match start and needs to handle the case that there are no variables with the appropriate names to bind to, etc, etc.

    You can see other warts in the implementation by doing certain forms of syntax error in the code, the error message will be distinctly unhelpful, and again apparently its a real bitch to fix.

    ---
    $world=~s/war/peace/g

      The only way to fix this is to take the Perl 6 approach and clean up the compilation semantics of regexes. All of these hacks are a direct result of trying to treat regexes as strings rather than as a real minilanguage. Interpolating variables into regexes prior to compilation is simply wrong. It destroys any semblance of lexical scoping for both variable bindings and error message location. If I were going to fix this in Perl 5, I'd make a lexically scoped pragma to compile regexes immediately with sane variable bindings to avoid all this two-pass compilation bogosity. Any other approach is just bandaids.

        Maybe I'm not seeing this properly, do you mean the issue of /$foo/ where $foo is a string? If so, its not clear to me how changing that makes the issue of rebinding compiled code into the pad of its usage context any easier.

        I can see how changing how variable interpolation is handled in regexes would make for a lot more flexibility in other respects, but its not clear how it helps the immediate issue of this thread... Can you explain a bit more please?

        ---
        $world=~s/war/peace/g