Re: Multiple uses of (?{ code }) do not appear to be called
by jettero (Monsignor) on Dec 29, 2006 at 12:40 UTC
|
Personally, I'm really bad at closures, but it seems clear the code block compiled in your regex is only compiled once and it's pushing onto the @o declared in the first call to foo(). It would seem closures pull a copy of their local scope with them when they stick around.
The obvious choices for articles here are "Why Closures?" (merlyn links to an article of his) and "Trying to understand closures."
I first ran into problems with closures in
the mod_perl docs. Although, I just went through all my posts though, because I swear I asked a question similar to yours here like three years ago...
UPDATE: It isn't exactly clear to my why this code is cached (or whatever) though. Why does that code block (as rhesa puts it) "get closed over?"
| [reply] [d/l] [select] |
|
|
Yep. Adding a print pos() inside the (?{code}) block shows that it does give the right pos values:
Moving the declaration of @o outside of sub foo makes it clear:
In other words, the @o in the (?{code}) block has been closed over, while the one in the print statement is a fresh array. The closed array is no longer accessible outside the regexp.
A naughty solution might be the following (abusing something of a bug in perl; this may break in future versions):
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
my @o = () if pos; # conditional declaration makes @o "static"
@o = (); # reset it always
my @m = ($window =~ m/(X\d+(?{push @o, pos()}))/g);
print "Matches: @m";
print "Offsets: @o";
print " ";
}
__END__
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
| [reply] [d/l] [select] |
|
|
Ha ha, looks like you and I were on the same path. I think I'll avoid using the bug though it is useful to know.
| [reply] |
|
|
|
|
|
|
|
|
|
I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem.
my @o;
foo();
foo();
foo();
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
@::o = ();
my @m = ($window =~ m/(X\d+(?{push @::o, pos()}))/g);
print join(" ", "Matches:", @m, "\n");
print join(" ", "Offsets:", @::o, "\n\n");
}
Could that be explained by closures too? I am still examining the re 'debug' output. | [reply] [d/l] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
local our @o;
my @m = $window =~ m/(X\d+(?{push @o, pos}))/g;
print "Matches: @m,\n";
print "Offsets: @o,\n\n";
}
foo;
foo;
foo;
__END__
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Could that be explained by closures too? I am still examining the re 'debug' output.
The debug output clearly shows you were correct in your first post — that the matching does indeed function. The problem (that is definitely cured by using a global @o) is that your regex code was pushing onto the wrong @o, one nolonger in any scope accessable by non-perl-deities.
| [reply] [d/l] [select] |
|
|
Why does that code block (as rhesa puts it) "get closed over?"
Because the current code implementing both (?{}) and (??{}) is a hack. The code is compiled once for performance reasons. Unfortunately its not simple to make it not a closure without a performance penalty. The issue you have think about is that the code could execute in almost any context due to qr//, thus the variable binding needs to occur at match start and needs to handle the case that there are no variables with the appropriate names to bind to, etc, etc.
You can see other warts in the implementation by doing certain forms of syntax error in the code, the error message will be distinctly unhelpful, and again apparently its a real bitch to fix.
---
$world=~s/war/peace/g
| [reply] |
|
|
The only way to fix this is to take the Perl 6 approach and clean up the compilation semantics of regexes. All of these hacks are a direct result of trying to treat regexes as strings rather than as a real minilanguage. Interpolating
variables into regexes prior to compilation is simply wrong.
It destroys any semblance of lexical scoping for both variable bindings and error message location. If I were going to fix this in Perl 5, I'd make a lexically scoped pragma to compile regexes immediately with sane variable bindings to avoid all this two-pass compilation bogosity. Any other approach is just bandaids.
| [reply] |
|
|
Re: Multiple uses of (?{ code }) do not appear to be called
by almut (Canon) on Dec 29, 2006 at 15:45 UTC
|
Yet another variant, which allows you to keep your original my @o.
The idea is to wrap a kind of "throw-away" subroutine around the push ... code.
The subroutine is created anew every time the ?{...} code is being called.
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
my @o;
my @m = ($window =~ m/(X\d+(?{(sub {push @o, @_})->(pos)}))/g);
print join(" ", "Matches:", @m, "\n");
print join(" ", "Offsets:", @o, "\n\n");
}
It thus is working around the closure effect of "binding" the first
instance of the dynamic variable @o to the ?{...} code.
(I'm not saying that this is in any way better than the other
suggestions (in particular blazar's suggestion using our @o)...
but heh, Perl would only be half as much fun if there weren't always
more than one way to do it :)
| [reply] [d/l] [select] |
|
|
I did try something similar using external anonymous subs but that didn't work. Interesting that it works if the sub is inline.
| [reply] |
|
|
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
my @o;
my $push_o = sub { push @o, @_ };
print "subN=$push_o\n";
my @m = ($window =~ m/(X\d+(?{print "sub1=$push_o\n"; $push_o->(po
+s)}))/g);
print join(" ", "Matches:", @m, "\n");
print join(" ", "Offsets:", @o, "\n\n");
}
Output:
subN=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
subN=CODE(0x8131ac0)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
Matches: X20 X5 X17 X12
Offsets:
subN=CODE(0x8119d50)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
sub1=CODE(0x8119060)
Matches: X20 X5 X17 X12
Offsets:
Despite three subroutines being created (--> "subN=..."), the one
actually being called within ?{...} is always the same, first
one (--> "sub1=..."). This sub in turn is bound to the first
instance of the variable @o ... | [reply] [d/l] [select] |
|
|
There is an appreciation of this at
http://use.perl.org/~jjore/journal/32016
I like it too. Unfortunately, on perl-5.8.3 I get a segmentation fault.
| [reply] |
|
|
It's much easier, safer and faster to use package variables when dealing with regexps.
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
local our @o;
my @m = ($window =~ m/(X\d+(?{ push @o, pos }))/g);
print join(" ", "Matches:", @m, "\n");
print join(" ", "Offsets:", @o, "\n\n");
}
Update: Oops, this has already been said elsewhere in the thread. I didn't enter this thread at the root node.
| [reply] [d/l] |
Re: Multiple uses of (?{ code }) do not appear to be called
by quester (Vicar) on Dec 30, 2006 at 02:49 UTC
|
I think it is a just variation on the "will not stay shared" theme. If you insert a named subroutine that refers to @o and use warnings; use strict; it will give you a warning:
use warnings;
use strict;
sub foo {
my @o;
sub demonstration { @o };
}
Variable "@o" will not stay shared at xxx.pl line 6.
Changing "use warnings" to "use diagnostics" prints the postcard version of the message:
Variable "@o" will not stay shared at xxx.pl line 6 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the *first*
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced, they
are automatically rebound to the current values of such variables.
It's unfortunate that it doesn't seem to warn about that when the variable is used in (?{code}). | [reply] [d/l] |
Re: Multiple uses of (?{ code }) do not appear to be called
by Anonymous Monk on Dec 29, 2006 at 12:44 UTC
|
| [reply] [d/l] |