in reply to Pattern matching when there are exception strings

You can have a negation class before your matching text. In most RegEx diatlects, classes are in square brackets, and the caret symbol negates the whole class.

So, in order to match 'ALPHA' with anything *but* and underscore or a hash, you'd have something based on: [^#_]ALPHA.

Replies are listed 'Best First'.
Re^2: Pattern matching when there are exception strings
by davidrw (Prior) on Sep 21, 2005 at 13:14 UTC
    That wouldn't match the string "ALPHA, I AM" ... also, a character class can't deal with the "XL5 ALPHA" exception..

    .. which leads what to the author is looking for -- the zero-width negative look-behind assertion "(?<!pattern)" (see perlre).
    use strict; use warnings; while(<DATA>){ print /(?<!XL5 )(?<![#_])ALPHA/ ? "OK\n" : "NOT OK\n"; } __DATA__ I AM ALPHA AND GOOD ALPHA I AM ALPHA AND BETA I AM ALPHA AND #ALPHA I AM ALPHA AND _ALPHA I AM ALPHA AND XL5 ALPHA I AM #ALPHA I AM _ALPHA I AM XL5 ALPHA __OUTPUT__ OK OK OK OK OK NOT OK NOT OK NOT OK

      You're right, and I'll take my lumps for being wrong.

      But what I haven't quite figured out is *why* [^_#]ALPHA wouldn't match on 'ALPHA, I AM'. Is it because the character class has to match _some_ character? My train of thought was that the negation would be primary. "Is there an underscore or a hash leading?" "No."

        The (?<!pattern) construct is called "zero-width" because it matches without consuming any characters. [] is not zero-width. It has to match one or more characters.
        But what I haven't quite figured out is *why* ^_#ALPHA wouldn't match on 'ALPHA, I AM'. Is it because the character class has to match _some_ character?
        Yes -- it has to match exactly one character ... you can, however, use the quantifiers:
        /[A]B/; # must be 'AB' /[A]?B/; # 0 or 1, so 'AB' or 'B' or 'CB' /[A]+B/; # 1 or more, so 'AB', or 'AAAB' /[A]*B/; # 0 or more, so AB' or 'AAAB' or 'B' or 'CB'
        My train of thought was that the negation would be primary. "Is there an underscore or a hash leading?" "No."
        "Is there a character that is (NOT) an underscore or a hash"
        The "NOT" being there iff there's the leading carat.
Re^2: Pattern matching when there are exception strings
by chester (Hermit) on Sep 21, 2005 at 13:19 UTC
    That wouldn't match "ALPHA" on a line by itself; it demands a non-# non-_ character before the ALPHA. If you do it this way you probably want a zero-width assertion.

    My first thought is to do a global replacement of all the exceptions, to remove them from the data. Then search what's left for the strings you do want to count. Performance-wise I don't know how great this would be, but it's easier to read (and possibly easier to write if it needs to be automated for a large amount of search strings).

    while(<>){ s/(?:_|#|XL5 )ALPHA//g; print if /ALPHA/; }

    edit: davidrw beat me to it.

      Splicing out things as you suggest would leave the possibility (however remote) that a false match would be created by the pasted-together remnants. Like if you removed "XL5 ALPHA" from "ALXL5 ALPHAPHA". See my split-and-grep recommendation for a similar technique that isn't subject to the pasting problem.

      Caution: Contents may have been coded under pressure.