miguelnyc703 has asked for the wisdom of the Perl Monks concerning the following question:

Can someone please explain why this is matching at all?

$ echo 'CME.b/ESM8' | grep -P '^CME\.b?[^/.]' CME.b/ESM8
In the string, 'CME.' is a literal match (ok), followed by an optional 'b' (ok, it is present), but then the forward slash '/' should not match the negated character set as this is clearly saying "anything except / or ." The above was also tested using an actual Perl script:
#!/usr/bin/perl use strict; use warnings; my $string = 'CME.b/ESM8'; if ( $string =~ /^CME\.b?[^\/.]/ ) { print "yes\n"; } else { print "no\n"; }
but that also indicates a match: <code> $ ./foo.pl yes {/code} I've been writing Perl regexes for years but this is really puzzling me. Tanks! Miguel

Replies are listed 'Best First'.
Re: Strange negated character class behavior
by tybalt89 (Monsignor) on Nov 08, 2018 at 22:56 UTC

    [^\/.] matches the 'b'

Re: Strange negated character class behavior (updated)
by AnomalousMonk (Archbishop) on Nov 08, 2018 at 23:10 UTC

    What tybalt89 said. This can be clearly seen by capturing and printing the matching substrings of the regex components:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $string = 'CME.b/ESM8'; print qq{'$1' '$2'} if $string =~ /^(CME\.b?)([^\/.])/; " 'CME.' 'b'

    Update:

    ... please explain why ...
    The regex engine tries its best to make an overall match any which way it can. The  b? match is specified as being optional. The  b? match assertion initially matches the  'b' character in the string, but the subsequent required  [^/.] match then fails for the reason noted in the OP. The regex engine then backtracks and eliminates the optional  b? match. It then has a  'b' character for  [^/.] to match, and the overall match succeeds.

    Question: What would have happened if the  b match had not been optional (fairly obvious), or if it had been a  b?+ possessive (available in Perl versions 5.10+) optional match? (Update: Prior to version 5.10, the possessive quantifier modification effect can be achieved by wrapping the expression in a  (?>...) "atomic" grouping, so  (?>b?) would work exactly the same.)


    Give a man a fish:  <%-{-{-{-<

Re: Strange negated character class behavior
by ikegami (Patriarch) on Nov 09, 2018 at 13:19 UTC

    Solutions:

    ^CME\.b?+[^/.]
    or
    ^CME\.(?:b[^/.]|[^b/.])

    Note that you seem to think grep -P uses Perl, but it uses PCRE. There are some differences between the two.

Re: Strange negated character class behavior (re debug rxrx)
by Anonymous Monk on Nov 09, 2018 at 03:43 UTC
    add use re 'debug'; or run program as perl -Mre=debug ...pl to see how the regex matches/backtracks ..

    you can use rxrx to step through the same regex with colored console output

Re: Strange negated character class behavior
by Anonymous Monk on Nov 09, 2018 at 18:33 UTC
    Thanks everyone for your input! I get it now. Much appreciated