Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I've a piece of code as follows:

if ($sentence =~ /\b$symbol\b/) { # do something }
$symbol can be a word or a comma. The above regex doesn't work if $symbol is a comma. Basically I wanted to test for the presence of $symbol in $sentence.

Please enlighten me.

TIA :)

Replies are listed 'Best First'.
Re: Regex help
by japhy (Canon) on Jan 14, 2006 at 16:20 UTC
    You don't want to use "word boundary", but rather "boundary". There is no shortcut for boundary, so you'll have to use:
    /(?:(?=\w)(?<!\w)|(?=\W)(?<!\W))($thing)(?:(?<=\w)(?!\w)|(?<=\W)(?!\W) +)/
    You could also use the (?(...)TRUE|FALSE) assertion to clean that up a bit:
    /(?(?=\w)(?<!\w)|(?<!\W))($thing)(?(?<=\w)(?!\w)|(?!\W))/

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

      I'm confused. Aren't your left and right side zero length assertions:

      (?(?=\w)(?<!\w)|(?<!\W))

      and

      (?(?<=\w)(?!\w)|(?!\W))

      just synonymous with \b?

        No, \b says there's a word character on exactly one side, like: (?:(?=\w)(?<!\w)|(?!\w)(?<=\w)).
Re: Regex help
by BrowserUk (Patriarch) on Jan 14, 2006 at 16:26 UTC

    With the comma bracketed by \b, that will only match if there are no spaces (or other non-word characters) either side of the comma. Ie. in this sentence;

    this,that & this, that & this ,that & this , that"

    only the first comma will be matched.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Regex help
by Corion (Patriarch) on Jan 14, 2006 at 16:16 UTC

    The regex should still "work" when $symbol is a comma. Maybe you should show us some input data, one string where it succeeds and one where it fails, and also the values for $symbol.

      Let's say I've the following:

      $sentence: That is naughty cat.
      $symbol: naugh

      That passes the test but I need it to fail because "naugh" is only part of "naughty".

        Sorry, what I meant to say is: If I removed the two '\b', a partial word and a comma are both matched. But what I need is to match either a comma or a complete word in the given sentence.
Re: Regex help
by Eimi Metamorphoumai (Deacon) on Jan 14, 2006 at 18:53 UTC
    The problem is that \b marks a "word boundary", that is, a place where one side must be a a word character (alphanumeric or _) and the other side isn't a word character. If the text in $symbol begins and ends with word characters, then it does what you want. But if the text in $symbol is, say, a comma, which isn't a word character, then your test is ensuring that it's surrounded on both sides by word characters. So what you want is not to assert a boundary, but rather that $symbol not be preceded by or followed by wordchars.
    if ($sentence =~ /(?<!\w)$symbol(?!\w)/){ # do something }
Re: Regex help
by pKai (Priest) on Jan 14, 2006 at 22:33 UTC

    If I read the requirement of AM correct, he wants to check for the presence of a $symbol in a $sentence, where the beginning and end of $symbol should also be treated as word boundaries, provided they are word characters.

    For this I would propose the following regex (using conditional pattern as suggested by japhy above):

    if ($sentence =~ /(?(?=\w)(?:\b))$symbol(?(?<=\w)(?:\b))/) { # do something }
      This gets my vote as "most likely intention".