Word boundary '\B'

prasadbabu has asked for the wisdom of the Perl Monks concerning the following question:

Monks, Today my colleague asked me to just replace the entities ~~~ with ~~~ in a small file. Both side of the entities, alphabets or digits or underscore ll be there. So i just wrote a small script but i found strange behaviour.

So i tried with small string sample and found the same answer as shown below:

First tested:
-------------
$str = 'abac 123 afa123f';
$str =~ s|\B123\B|***|g;
print $str;
output i got as i expected:
---------------------------
abac 123 afa***f


Second tested:
--------------
$str = 'abac &#x0007E;&#x0007E;&#x0007E; afa&#x0007E;&#x0007E;&#x0007E
+;f';
$str =~ s|\B\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;\B|~~~|g;
print $str;
output i got:
-------------
abac ~~~ afa&#x0007E;&#x0007E;&#x0007E;f
expected output:
----------------
abac &#x0007E;&#x0007E;&#x0007E; afa~~~f
[download]

For clarification i went through the documentation as well, it says what i expected. Why this strange behaviour. Where am i going wrong?

Prasad

Comment on Word boundary '\B' - Question Select or Download Code

Replies are listed 'Best First'.
Re: Word boundary '\B' - Question by Sidhekin (Priest) on Aug 21, 2006 at 15:29 UTC
Neither ' ', '&', nor ';' are word characters, so there is no word boundary between them. On the other hand, 'a', and 'f' are word characters (and '&' and ';' still aren't), so there are your word boundaries. You probably want either negative or positive lookahead/lookbehind for (non-)whitespace instead. Negative version: `$str = 'abac ~~~ afa~~&#x0007E +;f'; $str =~ s\|(?<!\s)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?!\s)\|~~~\|g; print $str;` [download] Positive lookahead/lookbehind won't match at end/beginning of string: `$str = 'abac ~~~ afa~~&#x0007E +;f'; $str =~ s\|(?<=\S)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?=\S)\|~~~\|g; print $str;` [download] `print "Just another Perl ${\(trickster and hacker)},"` The Sidhekin proves Sidhe did it!	[reply] [d/l] [select]
Re^2: Word boundary '\B' - Question by prasadbabu (Prior) on Aug 21, 2006 at 15:46 UTC
Sidhekin, thanks for the clarification. I was deceived by the follwing line from the Perl in 21 days book. /\Bdef\B/ matches cdefg or abcdefghi, but not def, defghi, or abcdef. But now when i re-read the documentation i get clear answer. From Documentation: A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order) Prasad	[reply]
Re: Word boundary '\B' - Question by ikegami (Patriarch) on Aug 21, 2006 at 16:16 UTC
You might find the following to your liking: `$str =~ s\|(?<=\S)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?=\S)\|~~~\|g;` [download]	[reply] [d/l]
Re^2: Word boundary '\B' - Question by ysth (Canon) on Aug 21, 2006 at 21:32 UTC
Since he particularly mentions alphanumeric or underscore, I think \w would be more appropriate than \S. Which would be equivalent to: `$str =~ s/\b~~~\b/~~~/g;`	[reply] [d/l]