prasadbabu has asked for the wisdom of the Perl Monks concerning the following question:

Monks, Today my colleague asked me to just replace the entities ~~~ with ~~~ in a small file. Both side of the entities, alphabets or digits or underscore ll be there. So i just wrote a small script but i found strange behaviour.

So i tried with small string sample and found the same answer as shown below:

First tested: ------------- $str = 'abac 123 afa123f'; $str =~ s|\B123\B|***|g; print $str; output i got as i expected: --------------------------- abac 123 afa***f Second tested: -------------- $str = 'abac ~~~ afa~~&#x0007E +;f'; $str =~ s|\B\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;\B|~~~|g; print $str; output i got: ------------- abac ~~~ afa~~~f expected output: ---------------- abac ~~~ afa~~~f

For clarification i went through the documentation as well, it says what i expected. Why this strange behaviour. Where am i going wrong?

Prasad

Replies are listed 'Best First'.
Re: Word boundary '\B' - Question
by Sidhekin (Priest) on Aug 21, 2006 at 15:29 UTC

    Neither ' ', '&', nor ';' are word characters, so there is no word boundary between them.

    On the other hand, 'a', and 'f' are word characters (and '&' and ';' still aren't), so there are your word boundaries.

    You probably want either negative or positive lookahead/lookbehind for (non-)whitespace instead. Negative version:

    $str = 'abac &#x0007E;&#x0007E;&#x0007E; afa&#x0007E;&#x0007E;&#x0007E +;f'; $str =~ s|(?<!\s)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?!\s)|~~~|g; print $str;

    Positive lookahead/lookbehind won't match at end/beginning of string:

    $str = 'abac &#x0007E;&#x0007E;&#x0007E; afa&#x0007E;&#x0007E;&#x0007E +;f'; $str =~ s|(?<=\S)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?=\S)|~~~|g; print $str;

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      Sidhekin, thanks for the clarification.

      I was deceived by the follwing line from the Perl in 21 days book.

      /\Bdef\B/ matches cdefg or abcdefghi, but not def, defghi, or abcdef.

      But now when i re-read the documentation i get clear answer.

      From Documentation:

      A word boundary (\b ) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order)

      Prasad

Re: Word boundary '\B' - Question
by ikegami (Patriarch) on Aug 21, 2006 at 16:16 UTC

    You might find the following to your liking:

    $str =~ s|(?<=\S)\&\#x0007E\;\&\#x0007E\;\&\#x0007E\;(?=\S)|~~~|g;
      Since he particularly mentions alphanumeric or underscore, I think \w would be more appropriate than \S. Which would be equivalent to: $str =~ s/\b&#x0007E;&#x0007E;&#x0007E;\b/~~~/g;