johnrcomeau has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a zero-width "Look-Around Assertion", part of the extended regular expression syntax of Perl. Here's a little test code that isn't working as I expect:
my $string = '54W'; if ($string =~ /(?=4)W/x) { warn "1 true\n"; } if ($string =~ /(?=\d)W/x) { warn "2 true\n"; } if ($string =~ /\dW/x) { warn "3 true\n"; }
Only the 3rd regular expression shows as true. I don't understand why the '4' is not matching the (?=4) pattern or (?=\d) pattern in the first two tests. Regards, John

Replies are listed 'Best First'.
Re: zero-width assertions for extended regular expressions
by smls (Friar) on May 26, 2014 at 19:10 UTC

    (?=\d) is a lookahead assertion.

    In order to check for the presence of a digit before the rest of the match, you need a lookbehind assertion: (?<=\d)

    See perlreref#EXTENDED-CONSTRUCTS for an overview of the different look-around constructs.

      Yep.
      $ perl -e 'my $string = "54W"; warn "1 true\n" if $string =~ /(?<=4)W/ +;' 1 true
Re: zero-width assertions for extended regular expressions
by AnomalousMonk (Archbishop) on May 26, 2014 at 22:50 UTC

    Another way to think of it is that (zero-width) look-around assertions act from positions between characters, so in the string
                                 5 4 W
                                    ^^
                                    ||
        (?=4) looks forward from here|
                                     |
             at the character (W) here

    So is W the same as 4? It is not. (Actually, the regex checks every possible position since it is not constrained to do otherwise, but even so, W is never 4.) And for  (?=\d) likewise.

    Update: Also consider:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '54W'; ;; print 'match' if $s =~ m{ (?=4) . W }xms; " match

      Good commentary AnomalousMonk :)

      This is where something like rxrx shines over use re 'debug'; as it shows you how the matching goes without the interference of optimizations :)

      So hopefully you can see from rxrx output that zero-width assertions match from current position without advancing that pos()ition . "Look-behind matches text up to the current match position, look-ahead matches text following the current match position." The position is pos() an is advanced by matching patterns

      So in the OPs , pos(0) tries to see ahead if '5' is 4 which its not, the bumps pos(1) and tries to see ahead if '4' is 4 and it is , but then it fails to match a W at pos(1) , because the look-ahead didn't advance pos()ition, and there is a 4 at pos(1) and not a W ..... :)