Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've got a problem with word boundaries in regexps.
$string_to_check="yyy foo-bar doo zzz"; if ($string_to_check =~ /\b$check\b/) { ... }
the if-statement is true if $check="foo-bar doo" and it's also true if $check="bar doo" I need it to match 'foo-bar doo' only, not 'bar doo'. Is there a possibility to include '-' into /b? Or should I do it another way? Thanks in advance! Lynn

Replies are listed 'Best First'.
Re: Word boundary in regexp
by ikegami (Patriarch) on Jan 25, 2012 at 21:48 UTC
    \b
    is equivalent to
    (?:(?<=\w)(?!\w)|(?<!\w)(?=\w))

    If you want to include "-" as a word character, just change \w to [\w-].

    (?:(?<=[\w-])(?![\w-])|(?<![\w-])(?=[\w-]))

    All together, that gives

    my $my_slash_b = qr/(?:(?<=[\w-])(?![\w-])|(?<![\w-])(?=[\w-]))/; /$my_slash_b$check$my_slash_b/

    But since you "know" that $check starts and ends with a word character, the above simplifies to

    /(?<![\w-])$check(?![\w-])/

    If you wanted to go further and count any non-whitepspace as word characters, you're left with

    my $my_slash_b = qr/(?:(?<=\S)(?!\S)|(?<!\S)(?=\S))/; /$my_slash_b$check$my_slash_b/

    and thus

    /(?<!\S)$check(?!\S)/
Re: Word boundary in regexp
by Eliya (Vicar) on Jan 25, 2012 at 21:30 UTC

    See also Defining Characters in Word Boundary?.

    Based on the suggestion in the first reply there:

    my $string_to_check = "yyy foo-bar doo zzz"; my $set = '[\w-]'; my $b = qr/(?<=$set)(?!$set)|(?<!$set)(?=$set)/; for my $check ("foo-bar doo", "bar doo") { if ($string_to_check =~ /$b$check$b/) { print "$check matched\n"; } }
Re: Word boundary in regexp
by JavaFan (Canon) on Jan 25, 2012 at 21:16 UTC
    Another way, using "modern" regexp verbs:
    /$check(?:\S(*COMMIT)(*FAIL))?+|\S+(*SKIP)(*FAIL)/
Re: Word boundary in regexp
by JavaFan (Canon) on Jan 25, 2012 at 21:04 UTC
Re: Word boundary in regexp
by Anonymous Monk on Jan 25, 2012 at 22:27 UTC
    Wow, thanks! Everything works fine! Lynn