Multiple uses of (?{ code }) do not appear to be called

bsdz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Multiple uses of (?{ code }) do not appear to be called by jettero (Monsignor) on Dec 29, 2006 at 12:40 UTC
Personally, I'm really bad at closures, but it seems clear the code block compiled in your regex is only compiled once and it's pushing onto the `@o` declared in the first call to `foo()`. It would seem closures pull a copy of their local scope with them when they stick around. The obvious choices for articles here are "Why Closures?" (merlyn links to an article of his) and "Trying to understand closures." I first ran into problems with closures in the mod_perl docs. Although, I just went through all my posts though, because I swear I asked a question similar to yours here like three years ago... UPDATE: It isn't exactly clear to my why this code is cached (or whatever) though. Why does that code block (as rhesa puts it) "get closed over?" -Paul	[reply] [d/l] [select]
Re^2: Multiple uses of (?{ code }) do not appear to be called by rhesa (Vicar) on Dec 29, 2006 at 12:56 UTC
Yep. Adding a `print pos()` inside the `(?{code})` block shows that it does give the right pos values: Read more... (786 Bytes) Moving the declaration of `@o` outside of `sub foo` makes it clear: Read more... (837 Bytes) In other words, the `@o` in the `(?{code})` block has been closed over, while the one in the `print` statement is a fresh array. The closed array is no longer accessible outside the regexp. A naughty solution might be the following (abusing something of a bug in perl; this may break in future versions): `sub foo { my $window = "a b X20 c X5 d e X17 X12"; my @o = () if pos; # conditional declaration makes @o "static" @o = (); # reset it always my @m = ($window =~ m/(X\d+(?{push @o, pos()}))/g); print "Matches: @m"; print "Offsets: @o"; print " "; } __END__ Matches: X20 X5 X17 X12 Offsets: 7 12 20 24 Matches: X20 X5 X17 X12 Offsets: 7 12 20 24 Matches: X20 X5 X17 X12 Offsets: 7 12 20 24` [download]	[reply] [d/l] [select]
Re^3: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 13:01 UTC
Ha ha, looks like you and I were on the same path. I think I'll avoid using the bug though it is useful to know.	[reply]
Re^4: Multiple uses of (?{ code }) do not appear to be called by rhesa (Vicar) on Dec 29, 2006 at 14:41 UTC
Re^5: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 14:53 UTC
Re^5: Multiple uses of (?{ code }) do not appear to be called by diotalevi (Canon) on Dec 29, 2006 at 15:57 UTC
Some notes below your chosen depth have not been shown here
Re^2: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 12:56 UTC
I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem. `my @o; foo(); foo(); foo(); sub foo { my $window = "a b X20 c X5 d e X17 X12"; @::o = (); my @m = ($window =~ m/(X\d+(?{push @::o, pos()}))/g); print join(" ", "Matches:", @m, "\n"); print join(" ", "Offsets:", @::o, "\n\n"); }` [download] Could that be explained by closures too? I am still examining the re 'debug' output.	[reply] [d/l]
Re^3: Multiple uses of (?{ code }) do not appear to be called by blazar (Canon) on Dec 29, 2006 at 14:05 UTC
I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem. Then perhaps instead of `@::o = ()` you may want to use our in conjunction with local: `#!/usr/bin/perl use strict; use warnings; sub foo { my $window = "a b X20 c X5 d e X17 X12"; local our @o; my @m = $window =~ m/(X\d+(?{push @o, pos}))/g; print "Matches: @m,\n"; print "Offsets: @o,\n\n"; } foo; foo; foo; __END__` [download]	[reply] [d/l] [select]
Re^4: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 14:50 UTC
Re^5: Multiple uses of (?{ code }) do not appear to be called by blazar (Canon) on Dec 29, 2006 at 18:53 UTC
Some notes below your chosen depth have not been shown here
Re^4: Multiple uses of (?{ code }) do not appear to be called by rhesa (Vicar) on Dec 29, 2006 at 15:42 UTC
Re^5: Multiple uses of (?{ code }) do not appear to be called by ikegami (Patriarch) on Dec 29, 2006 at 19:14 UTC
Some notes below your chosen depth have not been shown here
Re^5: Multiple uses of (?{ code }) do not appear to be called by diotalevi (Canon) on Dec 29, 2006 at 15:58 UTC
Re^5: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 21:54 UTC
Some notes below your chosen depth have not been shown here
Re^3: Multiple uses of (?{ code }) do not appear to be called by jettero (Monsignor) on Dec 29, 2006 at 13:23 UTC
Could that be explained by closures too? I am still examining the re 'debug' output. The debug output clearly shows you were correct in your first post — that the matching does indeed function. The problem (that is definitely cured by using a global `@o`) is that your regex code was pushing onto the wrong `@o`, one nolonger in any scope accessable by non-perl-deities. -Paul	[reply] [d/l] [select]
Re^2: Multiple uses of (?{ code }) do not appear to be called by demerphq (Chancellor) on Dec 30, 2006 at 14:58 UTC
Why does that code block (as rhesa puts it) "get closed over?" Because the current code implementing both (?{}) and (??{}) is a hack. The code is compiled once for performance reasons. Unfortunately its not simple to make it not a closure without a performance penalty. The issue you have think about is that the code could execute in almost any context due to qr//, thus the variable binding needs to occur at match start and needs to handle the case that there are no variables with the appropriate names to bind to, etc, etc. You can see other warts in the implementation by doing certain forms of syntax error in the code, the error message will be distinctly unhelpful, and again apparently its a real bitch to fix. --- $world=~s/war/peace/g	[reply]
Re^3: Multiple uses of (?{ code }) do not appear to be called by TimToady (Parson) on Dec 30, 2006 at 17:21 UTC
The only way to fix this is to take the Perl 6 approach and clean up the compilation semantics of regexes. All of these hacks are a direct result of trying to treat regexes as strings rather than as a real minilanguage. Interpolating variables into regexes prior to compilation is simply wrong. It destroys any semblance of lexical scoping for both variable bindings and error message location. If I were going to fix this in Perl 5, I'd make a lexically scoped pragma to compile regexes immediately with sane variable bindings to avoid all this two-pass compilation bogosity. Any other approach is just bandaids.	[reply]
Re^4: Multiple uses of (?{ code }) do not appear to be called by demerphq (Chancellor) on Dec 31, 2006 at 12:37 UTC
Re: Multiple uses of (?{ code }) do not appear to be called by almut (Canon) on Dec 29, 2006 at 15:45 UTC
Yet another variant, which allows you to keep your original `my @o`. The idea is to wrap a kind of "throw-away" subroutine around the `push ...` code. The subroutine is created anew every time the `?{...}` code is being called. `sub foo { my $window = "a b X20 c X5 d e X17 X12"; my @o; my @m = ($window =~ m/(X\d+(?{(sub {push @o, @_})->(pos)}))/g); print join(" ", "Matches:", @m, "\n"); print join(" ", "Offsets:", @o, "\n\n"); }` [download] It thus is working around the closure effect of "binding" the first instance of the dynamic variable `@o` to the `?{...}` code. (I'm not saying that this is in any way better than the other suggestions (in particular blazar's suggestion using `our @o`)... but heh, Perl would only be half as much fun if there weren't always more than one way to do it :)	[reply] [d/l] [select]
Re^2: Multiple uses of (?{ code }) do not appear to be called by bsdz (Friar) on Dec 29, 2006 at 15:48 UTC
I did try something similar using external anonymous subs but that didn't work. Interesting that it works if the sub is inline.	[reply]
Re^3: Multiple uses of (?{ code }) do not appear to be called by almut (Canon) on Dec 29, 2006 at 18:09 UTC
Thing is, if you declare the sub outside of the `?{...}` code, that code will bind the first instance of the subroutine. Printing the coderefs shows what's going on: `sub foo { my $window = "a b X20 c X5 d e X17 X12"; my @o; my $push_o = sub { push @o, @_ }; print "subN=$push_o\n"; my @m = ($window =~ m/(X\d+(?{print "sub1=$push_o\n"; $push_o->(po +s)}))/g); print join(" ", "Matches:", @m, "\n"); print join(" ", "Offsets:", @o, "\n\n"); }` [download] Output: `subN=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) Matches: X20 X5 X17 X12 Offsets: 7 12 20 24 subN=CODE(0x8131ac0) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) Matches: X20 X5 X17 X12 Offsets: subN=CODE(0x8119d50) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) sub1=CODE(0x8119060) Matches: X20 X5 X17 X12 Offsets:` [download] Despite three subroutines being created (--> "subN=..."), the one actually being called within `?{...}` is always the same, first one (--> "sub1=..."). This sub in turn is bound to the first instance of the variable `@o` ...	[reply] [d/l] [select]
Re^2: Multiple uses of (?{ code }) do not appear to be called by Anonymous Monk on Dec 30, 2006 at 06:41 UTC
There is an appreciation of this at http://use.perl.org/~jjore/journal/32016 I like it too. Unfortunately, on perl-5.8.3 I get a segmentation fault.	[reply]
Re^2: Multiple uses of (?{ code }) do not appear to be called by ikegami (Patriarch) on Mar 12, 2007 at 23:32 UTC
It's much easier, safer and faster to use package variables when dealing with regexps. `sub foo { my $window = "a b X20 c X5 d e X17 X12"; local our @o; my @m = ($window =~ m/(X\d+(?{ push @o, pos }))/g); print join(" ", "Matches:", @m, "\n"); print join(" ", "Offsets:", @o, "\n\n"); }` [download] Update: Oops, this has already been said elsewhere in the thread. I didn't enter this thread at the root node.	[reply] [d/l]
Re: Multiple uses of (?{ code }) do not appear to be called by quester (Vicar) on Dec 30, 2006 at 02:49 UTC
I think it is a just variation on the "will not stay shared" theme. If you insert a named subroutine that refers to @o and use warnings; use strict; it will give you a warning: `use warnings; use strict; sub foo { my @o; sub demonstration { @o }; }` [download] Variable "@o" will not stay shared at xxx.pl line 6. Changing "use warnings" to "use diagnostics" prints the postcard version of the message: Variable "@o" will not stay shared at xxx.pl line 6 (#1) (W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the first call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables. It's unfortunate that it doesn't seem to warn about that when the variable is used in (?{code}).	[reply] [d/l]
Re: Multiple uses of (?{ code }) do not appear to be called by Anonymous Monk on Dec 29, 2006 at 12:44 UTC
Try `use re 'debug';`	[reply] [d/l]