RK has asked for the wisdom of the Perl Monks concerning the following question:

I want to match "bar test" not preceded by "foo" and also account for multiple whitespace characters.

my $regex = "(?<!foo)\\s+bar\\s+test"; my $txt = "foo bar \ntest"; if($txt =~ /($regex)/i) { print "match"; } else { print "not a match"; }

I would expect it to not match but it does. What is wrong with the regex?

Replies are listed 'Best First'.
Re: look behind with multiple whitespace characters
by AnomalousMonk (Archbishop) on Apr 05, 2015 at 17:46 UTC

    The regex  qr{(?<!foo)\s+bar} will not match
              'foo    bar'
                  ^^
                  ||
       no match --++-- can match here
    at the first position because it's preceded by 'foo', but at the second position it's not preceded by 'foo' but by a whitespace character, so it can match!

    Update: If you have Perl version 5.10+, here's one way to simulate a variable-width negative look-behind:

    c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; my $s = 'foo 1 foo 2 foo 3 4 x 567 foo foo 8 foo9'; my @matches = $s =~ m{ (?> foo \s+ \d+ (*SKIP)(*FAIL))? \s+ \d+}xmsg; printf qq{'$_' } for @matches; " ' 4' ' 567'
    See Special Backtracking Control Verbs in perlre for  (*SKIP) (*FAIL) and friends.
    See also Look Behind issues and Another Look behind for further recent discussion of the variable-width negative look-behind issue. (Update: Variable-width positive look-behind is nicely handled by the  \K operator.)


    Give a man a fish:  <%-(-(-(-<

Re: look behind with multiple whitespace characters
by Anonymous Monk on Apr 05, 2015 at 17:49 UTC
    Run it through rxrx and see what happens, or even
    $ perl -Mre=debug -e" $_ =qq{foo bar \ntest}; m/(?<!foo)\s+bar\s+t +est/g " Compiling REx "(?<!foo)\s+bar\s+test" Final program: 1: UNLESSM[-3] (7) 3: EXACT <foo> (5) 5: SUCCEED (0) 6: TAIL (7) 7: PLUS (9) 8: SPACE (0) 9: EXACT <bar> (11) 11: PLUS (13) 12: SPACE (0) 13: EXACT <test> (15) 15: END (0) floating "test" at 5..2147483647 (checking floating) minlen 9 Guessing start of match in sv for REx "(?<!foo)\s+bar\s+test" against +"foo bar %ntest" Found floating substr "test" at offset 12... Guessed: match at offset 0 Matching REx "(?<!foo)\s+bar\s+test" against "foo bar %ntest" 0 <> <foo bar> | 1:UNLESSM[-3](7) 0 <> <foo bar> | 7:PLUS(9) SPACE can match 0 times out of 21474 +83647... failed... 1 <f> <oo bar > | 1:UNLESSM[-3](7) 1 <f> <oo bar > | 7:PLUS(9) SPACE can match 0 times out of 21474 +83647... failed... 2 <fo> <o bar > | 1:UNLESSM[-3](7) 2 <fo> <o bar > | 7:PLUS(9) SPACE can match 0 times out of 21474 +83647... failed... 3 <foo> < bar %n> | 1:UNLESSM[-3](7) 0 <> <foo bar> | 3: EXACT <foo>(5) 3 <foo> < bar %n> | 5: SUCCEED(0) subpattern success... failed... 4 <foo > < bar %nt> | 1:UNLESSM[-3](7) 1 <f> <oo bar > | 3: EXACT <foo>(5) failed... 4 <foo > < bar %nt> | 7:PLUS(9) SPACE can match 3 times out of 21474 +83647... 7 <o > <bar %ntest> | 9: EXACT <bar>(11) 10 < bar> < %ntest> | 11: PLUS(13) SPACE can match 2 times out of 214 +7483647... 12 < bar %n> <test> | 13: EXACT <test>(15) 16 < bar %ntest> <> | 15: END(0) Match successful! Freeing REx: "(?<!foo)\s+bar\s+test"

    First it matches "foo" which isn't allowed, but then it matches "foo " which is allowed since its not "foo"

    Basically, you need to rethink what you're trying to achieve, see Look Behind issues

Re: look behind with multiple whitespace characters
by Marshall (Canon) on Apr 07, 2015 at 05:18 UTC
    This look behind regex stuff is complex.

    There might be an easier way to code this. If you could give say 10 example records, this might work out better.

    As you say, this doesn't match: foo bar test == But I guess that this should?: bar test == And this match?: foo bar test