daviddhall has asked for the wisdom of the Perl Monks concerning the following question:

Ugh...

How do I create a regular expression WORD match for something like this: (Simple version)
The value I want to match is "radio".

The values I'm matching against are:
"radiohead" NOT successful
"turn off your radio" Successful
"radio"Successful
"jill has a radio that is black" Successful

I'm only wanting an exact word match. I thought that this would work: m/\Wradio\W/ but it fails on the ones where there aren't any more characters before the value or after the value that matches. Surely there's some nifty little solution here. Thanks!

Replies are listed 'Best First'.
Re: Word Boundary Matching
by rchiav (Deacon) on Apr 19, 2001 at 03:34 UTC
    A couple things here. To match a word boundry, you're going to want to use a \b. If you're storing these words you're matching against in an array, you're going to want to quote the variable with  \Q \E.

    Here's an example that I use in something where I want to filter out lines of a file that contain words in an array.

    sub filter_line { my ($i, $line); $line = shift; foreach $i (@_) { if ($line =~ /\b\Q$i\E\b/) { return 1 } } return 0; }

    Hope this helps..
    Rich

    Addition: Read perlre and look at what constitutes a word boundry. In your example, it will match radio and not radiohead, but it will also match radio-head or radio.head, etc. But it will consider the underbar ( _ ) as part of the word.. so it won't match radio_head.

Re: Word Boundary Matching
by Mungbeans (Pilgrim) on Apr 19, 2001 at 11:34 UTC
    I think you need to extend your match to include, either the start/end of the line or a single space, after radio. This will capture cases where radio is in the middle of a phrase, or if it is at the start or end of your sentence.

    Use |s enclosed in parens to specify alternatives.

    Try something like below:

    #!/usr/bin/perl -w my @sentence_list = ( 'radiohead', 'turn off your radio', 'jill has a radio that is black', 'the stereo is broken', 'radio for happiness', ); foreach $sentence (@sentence_list){ if ($sentence =~ /(^|\W)radio(\W|$)/){ print "Found\n"; } else { print "Not found\n"; } }

    This gives results:

    Not found Found Found Not found Found
Re: Word Boundary Matching
by petdance (Parson) on Apr 19, 2001 at 16:06 UTC
    Look into the \b metacharacter, as in /\bradio\b/.

    # Andy Lester  http://www.petdance.com  AIM:petdance
    %_=split';','.; Perl ;@;st a;m;ker;p;not;o;hac;t;her;y;ju';
    print map $_{$_}, split //,
    'andy@petdance.com'
    
Re: Word Boundary Matching
by Anonymous Monk on Apr 19, 2001 at 16:58 UTC

    Try:

    m/\Wradio\W|^radio\W|\Wradio$/

    I wanted to really say something like:

    m/[^\W]radio[\W$]/

    Excpet that means something radically different than what I would be trying say which "Match anthing that either starts at the begining of a line or starts on a word boundery that has the text "radio" and ends on a word boundery or the end of a line." That is not what the thing above means though, so you have to settle for the first (unless someone else has a more simplified way of doing it).

    Cheers...james