in reply to regex for swear filter


My interpretation is that you dont want to match "patterns" that occur as part of larger words.
For this look at the use of word boundaries
would suggest a look at perlrequick(perldoc) for such problems,below is what it says about this one.
The word anchor \b matches a boundary between a word character and a +non-word character \w\W or \W\w: $x = "Housecat catenates house and cat"; $x =~ /\bcat/; # matches cat in 'catenates' $x =~ /cat\b/; # matches cat in 'housecat' $x =~ /\bcat\b/; # matches 'cat' at end of string

HTH
chimni

Replies are listed 'Best First'.
Re: Re: regex for swear filter
by Anonymous Monk on Feb 13, 2004 at 04:29 UTC

    How are you going to discuss the comparative advantages and disadvantages of various beasts of burden with a filter like that?

      When forced by a prudish management to solve a similar problem, I assigned a point system. Each regex that applied would add or subtract points. Only those matches passing a point threshold would be scrubbed.

      For example, it's more likely to be an intentional curse if it's at the beginning or ending of a word. It's more likely to be an intentional curse if it is the whole word (word boundaries on both ends). It's less likely if it appears buried in a word; these are not filtered, much to the relief of residents of Scunthorp.

      --
      [ e d @ h a l l e y . c c ]

        What ever filter you make, it's easy to circumvent. Just witness all the "spam" and "nanny" filters, that block emails or websites discussing breast cancer, or mentioning non-body parts like 'ass' and 'nipple', but allow texts mentioning 'V-I-A-G-R-A', 'H*T T!T$' or '\/\/3+ p|_|zz!35'.

        Swear filters are a technical solution to a social problem. Techinal solutions to social problems usually don't work, and have bad side effects.

        Abigail