in reply to Re: Pattern matching when there are exception strings
in thread Pattern matching when there are exception strings

That wouldn't match the string "ALPHA, I AM" ... also, a character class can't deal with the "XL5 ALPHA" exception..

.. which leads what to the author is looking for -- the zero-width negative look-behind assertion "(?<!pattern)" (see perlre).
use strict; use warnings; while(<DATA>){ print /(?<!XL5 )(?<![#_])ALPHA/ ? "OK\n" : "NOT OK\n"; } __DATA__ I AM ALPHA AND GOOD ALPHA I AM ALPHA AND BETA I AM ALPHA AND #ALPHA I AM ALPHA AND _ALPHA I AM ALPHA AND XL5 ALPHA I AM #ALPHA I AM _ALPHA I AM XL5 ALPHA __OUTPUT__ OK OK OK OK OK NOT OK NOT OK NOT OK

Replies are listed 'Best First'.
Re^3: Pattern matching when there are exception strings
by pboin (Deacon) on Sep 21, 2005 at 13:27 UTC

    You're right, and I'll take my lumps for being wrong.

    But what I haven't quite figured out is *why* [^_#]ALPHA wouldn't match on 'ALPHA, I AM'. Is it because the character class has to match _some_ character? My train of thought was that the negation would be primary. "Is there an underscore or a hash leading?" "No."

      The (?<!pattern) construct is called "zero-width" because it matches without consuming any characters. [] is not zero-width. It has to match one or more characters.
        [] is not zero-width. It has to match one or more characters.

        I'm only going to mention this to help avoid any confusion among newbies who may find this post sooner or later... Without a quantifier (*, +, {}, ?) it must match exactly one character.

        It might be clearer if you leave out assertions altogether and just show how character classes compare to literal characters. (If someone doesn't understand how character classes work, it probably isn't the right time to confuse them with zero-width assertions.) Start by explaining that /a/ is equivalent to /[a]/, progress by explaining that /[ab]/ is similarly equivalent except that it will match a 'b' as well, and then leap to the fact that /[^a]/ will simply match any character that is not an 'a'.

        -sauoq
        "My two cents aren't worth a dime.";
        
      But what I haven't quite figured out is *why* ^_#ALPHA wouldn't match on 'ALPHA, I AM'. Is it because the character class has to match _some_ character?
      Yes -- it has to match exactly one character ... you can, however, use the quantifiers:
      /[A]B/; # must be 'AB' /[A]?B/; # 0 or 1, so 'AB' or 'B' or 'CB' /[A]+B/; # 1 or more, so 'AB', or 'AAAB' /[A]*B/; # 0 or more, so AB' or 'AAAB' or 'B' or 'CB'
      My train of thought was that the negation would be primary. "Is there an underscore or a hash leading?" "No."
      "Is there a character that is (NOT) an underscore or a hash"
      The "NOT" being there iff there's the leading carat.