good catch!
> my conclusion is that the only way to handle the OP problem in a way fully consistent with \w{wb} semantics is to just split using it, and maybe repack non word fragments afterwards
My intuition says split on non-words like whitespace, reject "words" without \w or equivalent characters and repack the rest afterwards.
I doubt it's possible to cover all desirable edge cases by \b{wb} this will depend on the user's perspective, especially when considering multi-language environments and unicode.
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
In reply to Re^6: Splitting multiline string into words, the stuff between words, and newlines
by LanX
in thread Splitting multiline string into words, the stuff between words, and newlines
by ibm1620
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |