Re^2: Multiple uses of (?{ code }) do not appear to be called
by rhesa (Vicar) on Dec 29, 2006 at 12:56 UTC
|
Yep. Adding a print pos() inside the (?{code}) block shows that it does give the right pos values:
Moving the declaration of @o outside of sub foo makes it clear:
In other words, the @o in the (?{code}) block has been closed over, while the one in the print statement is a fresh array. The closed array is no longer accessible outside the regexp.
A naughty solution might be the following (abusing something of a bug in perl; this may break in future versions):
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
my @o = () if pos; # conditional declaration makes @o "static"
@o = (); # reset it always
my @m = ($window =~ m/(X\d+(?{push @o, pos()}))/g);
print "Matches: @m";
print "Offsets: @o";
print " ";
}
__END__
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
Matches: X20 X5 X17 X12
Offsets: 7 12 20 24
| [reply] [d/l] [select] |
|
|
Ha ha, looks like you and I were on the same path. I think I'll avoid using the bug though it is useful to know.
| [reply] |
|
|
{ # limit scope
my @o;
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
@o = ();
my @m = ( $window =~ m/(X\d+(?{push @o, pos()}))/g );
print "Matches: @m";
print "Offsets: @o";
print " ";
}
}
This will make sure that only foo() can see @o.
The second workaround is basically a rewrite of your code. It doesn't solve the general issue, but it avoids the use of the complicated (?{BLOCK}) feature for your particular case:
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
my( @o, @m );
while( $window =~ m/(X\d+)/g ) {
push @m, $1;
push @o, $+[0];
}
print "Matches: @m";
print "Offsets: @o";
print " ";
}
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re^2: Multiple uses of (?{ code }) do not appear to be called
by bsdz (Friar) on Dec 29, 2006 at 12:56 UTC
|
I must admit I am not too hot on closures either but another interesting observation is that making @o global appears to cure the problem.
my @o;
foo();
foo();
foo();
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
@::o = ();
my @m = ($window =~ m/(X\d+(?{push @::o, pos()}))/g);
print join(" ", "Matches:", @m, "\n");
print join(" ", "Offsets:", @::o, "\n\n");
}
Could that be explained by closures too? I am still examining the re 'debug' output. | [reply] [d/l] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
my $window = "a b X20 c X5 d e X17 X12";
local our @o;
my @m = $window =~ m/(X\d+(?{push @o, pos}))/g;
print "Matches: @m,\n";
print "Offsets: @o,\n\n";
}
foo;
foo;
foo;
__END__
| [reply] [d/l] [select] |
|
|
I'll be damned. It feels a little contradictory but it works. Maybe it's time I re-read all that Perl literature again!
| [reply] |
|
|
|
|
|
|
Heh, nifty. But why not simply say our @o = ();? Surely the scope is still limited to the sub. I'm a bit puzzled by how local interacts with our here.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Could that be explained by closures too? I am still examining the re 'debug' output.
The debug output clearly shows you were correct in your first post — that the matching does indeed function. The problem (that is definitely cured by using a global @o) is that your regex code was pushing onto the wrong @o, one nolonger in any scope accessable by non-perl-deities.
| [reply] [d/l] [select] |
Re^2: Multiple uses of (?{ code }) do not appear to be called
by demerphq (Chancellor) on Dec 30, 2006 at 14:58 UTC
|
Why does that code block (as rhesa puts it) "get closed over?"
Because the current code implementing both (?{}) and (??{}) is a hack. The code is compiled once for performance reasons. Unfortunately its not simple to make it not a closure without a performance penalty. The issue you have think about is that the code could execute in almost any context due to qr//, thus the variable binding needs to occur at match start and needs to handle the case that there are no variables with the appropriate names to bind to, etc, etc.
You can see other warts in the implementation by doing certain forms of syntax error in the code, the error message will be distinctly unhelpful, and again apparently its a real bitch to fix.
---
$world=~s/war/peace/g
| [reply] |
|
|
The only way to fix this is to take the Perl 6 approach and clean up the compilation semantics of regexes. All of these hacks are a direct result of trying to treat regexes as strings rather than as a real minilanguage. Interpolating
variables into regexes prior to compilation is simply wrong.
It destroys any semblance of lexical scoping for both variable bindings and error message location. If I were going to fix this in Perl 5, I'd make a lexically scoped pragma to compile regexes immediately with sane variable bindings to avoid all this two-pass compilation bogosity. Any other approach is just bandaids.
| [reply] |
|
|
Maybe I'm not seeing this properly, do you mean the issue of /$foo/ where $foo is a string? If so, its not clear to me how changing that makes the issue of rebinding compiled code into the pad of its usage context any easier.
I can see how changing how variable interpolation is handled in regexes would make for a lot more flexibility in other respects, but its not clear how it helps the immediate issue of this thread... Can you explain a bit more please?
---
$world=~s/war/peace/g
| [reply] |