Re: Regex help
by japhy (Canon) on Jan 14, 2006 at 16:20 UTC
|
You don't want to use "word boundary", but rather "boundary". There is no shortcut for boundary, so you'll have to use:
/(?:(?=\w)(?<!\w)|(?=\W)(?<!\W))($thing)(?:(?<=\w)(?!\w)|(?<=\W)(?!\W)
+)/
You could also use the (?(...)TRUE|FALSE) assertion to clean that up a bit:
/(?(?=\w)(?<!\w)|(?<!\W))($thing)(?(?<=\w)(?!\w)|(?!\W))/
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
No, \b says there's a word character on exactly one side, like:
(?:(?=\w)(?<!\w)|(?!\w)(?<=\w)).
| [reply] [d/l] |
|
|
|
|
|
Re: Regex help
by BrowserUk (Patriarch) on Jan 14, 2006 at 16:26 UTC
|
With the comma bracketed by \b, that will only match if there are no spaces (or other non-word characters) either side of the comma. Ie. in this sentence;
this,that & this, that & this ,that & this , that"
only the first comma will be matched.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Regex help
by Corion (Patriarch) on Jan 14, 2006 at 16:16 UTC
|
The regex should still "work" when $symbol is a comma. Maybe you should show us some input data, one string where it succeeds and one where it fails, and also the values for $symbol.
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
Sorry, what I meant to say is: If I removed the two '\b', a partial word and a comma are both matched. But what I need is to match either a comma or a complete word in the given sentence.
| [reply] |
Re: Regex help
by Eimi Metamorphoumai (Deacon) on Jan 14, 2006 at 18:53 UTC
|
The problem is that \b marks a "word boundary", that is, a place where one side must be a a word character (alphanumeric or _) and the other side isn't a word character. If the text in $symbol begins and ends with word characters, then it does what you want. But if the text in $symbol is, say, a comma, which isn't a word character, then your test is ensuring that it's surrounded on both sides by word characters. So what you want is not to assert a boundary, but rather that $symbol not be preceded by or followed by wordchars.
if ($sentence =~ /(?<!\w)$symbol(?!\w)/){
# do something
}
| [reply] [d/l] [select] |
Re: Regex help
by pKai (Priest) on Jan 14, 2006 at 22:33 UTC
|
If I read the requirement of AM correct, he wants to check for the presence of a $symbol in a $sentence, where the beginning and end of $symbol should also be treated as word boundaries, provided they are word characters.
For this I would propose the following regex (using conditional pattern as suggested by japhy above):
if ($sentence =~ /(?(?=\w)(?:\b))$symbol(?(?<=\w)(?:\b))/) {
# do something
}
| [reply] [d/l] [select] |
|
|
This gets my vote as "most likely intention".
| [reply] |