zero-width assertions for extended regular expressions

johnrcomeau has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a zero-width "Look-Around Assertion", part of the extended regular expression syntax of Perl. Here's a little test code that isn't working as I expect:

my $string = '54W';
if ($string =~ /(?=4)W/x) {
    warn "1 true\n";
}
if ($string =~ /(?=\d)W/x) {
    warn "2 true\n";
}
if ($string =~ /\dW/x) {
    warn "3 true\n";
}
[download]

Only the 3rd regular expression shows as true. I don't understand why the '4' is not matching the (?=4) pattern or (?=\d) pattern in the first two tests. Regards, John

Comment on zero-width assertions for extended regular expressions Download Code

Replies are listed 'Best First'.
Re: zero-width assertions for extended regular expressions by smls (Friar) on May 26, 2014 at 19:10 UTC
`(?=\d)` is a lookahead assertion. In order to check for the presence of a digit before the rest of the match, you need a lookbehind assertion: `(?<=\d)` See perlreref#EXTENDED-CONSTRUCTS for an overview of the different look-around constructs.	[reply] [d/l] [select]
Re^2: zero-width assertions for extended regular expressions by Laurent_R (Canon) on May 26, 2014 at 21:14 UTC
Yep. `$ perl -e 'my $string = "54W"; warn "1 true\n" if $string =~ /(?<=4)W/ +;' 1 true` [download]	[reply] [d/l]
Re: zero-width assertions for extended regular expressions by AnomalousMonk (Archbishop) on May 26, 2014 at 22:50 UTC
Another way to think of it is that (zero-width) look-around assertions act from positions between characters, so in the string `5 4 W` `^^` `\|\|` `(?=4) looks forward from here\|` `\|` `at the character (W) here` So is `W` the same as `4`? It is not. (Actually, the regex checks every possible position since it is not constrained to do otherwise, but even so, `W` is never `4`.) And for `(?=\d)` likewise. Update: Also consider: `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '54W'; ;; print 'match' if $s =~ m{ (?=4) . W }xms; " match` [download]	[reply] [d/l] [select]
Re^2: zero-width assertions for extended regular expressions (rxrx) by Anonymous Monk on May 27, 2014 at 07:41 UTC
Good commentary AnomalousMonk :) This is where something like rxrx shines over `use re 'debug';` as it shows you how the matching goes without the interference of optimizations :) Read more... (3 kB) So hopefully you can see from rxrx output that zero-width assertions match from current position without advancing that pos()ition . "Look-behind matches text up to the current match position, look-ahead matches text following the current match position." The position is pos() an is advanced by matching patterns So in the OPs , pos(0) tries to see ahead if '5' is 4 which its not, the bumps pos(1) and tries to see ahead if '4' is 4 and it is , but then it fails to match a W at pos(1) , because the look-ahead didn't advance pos()ition, and there is a 4 at pos(1) and not a W ..... :)	[reply] [d/l] [select]