Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have a pretty trivial question, but its slightly tricky. I need to search for the word 'error' in a file. But this word needs to start on a word boundary and also end in a word boundary i.e. something like ' This is an errorVariable.' should not be matched. But something like 'This is an error.' should be matched. I know i can use something like: if ( /\berror\b/i ) But the problem with this is it also matches patterns like : 'Sendind error.xls.txt to the file' Such a pattern should not be matched as the word error does not end on a word boundary. Could anybody suggest a way for this pattern matching. Thanks!

Replies are listed 'Best First'.
Re: Pattern Matching
by mirod (Canon) on Oct 04, 2001 at 21:02 UTC

    It looks like your definition of a word boundary differ from Perl's. For Perl a word is [a-zA-Z0-9_], so the . is not a word character, so the regexp engine finds a word boundary between error and .

    If you want to match "This is an error." but not "error.xls.txt" you're going to have to code it yourself.

    I don't see any sure way to do this but /\berror\b(?!\S\w)/ (error, then a word boundary then a non space character then a word character) should work _most_ of the time.

    I am not sure I would trust a program here, if you really want 100% accuracy you're probably better off finding a way to log and check the matches, if only so that you can fix your regexp when you find out it failed.

Re: Pattern Matching
by Fletch (Bishop) on Oct 04, 2001 at 21:05 UTC

    You need to read perldoc perlre more carefully. `error.xls.txt' definately does match /\berror\b/i because a word boundary (as matched by \b) is a transition between a character that matches \w and something that matches \W. You've got an impedence mismatch between what you're considering a word and word boundary and what the regexp engine thinks they are. In this specific instance you probably could catch it with /\berror\b(?!\.\w+)/i, which would rule out error followed by `.extension' or `.txt' (as examples).