An example showing how to split ...
That's actually still not that easy, just Perl complying to Unicode standards doesn't make it trivial
DB<13> $str = "I don't think 'don't' isn't > DB<14> x split /\b{wb}/, $str 0 'I' 1 ' ' 2 'don\'t' 3 ' ' 4 'think' 5 ' ' 6 '\'' 7 'don\'t' 8 '\'' 9 ' ' 10 'isn\'t' 11 ' ' 12 'a' 13 ' ' 14 'word' DB<15>
Like... ($str expanded with more edge cases)
DB<27> $str = "I don't think, 'don't' isn't a word..." DB<28> x @list= split /\b{wb}/, $str 0 'I' 1 ' ' 2 'don\'t' 3 ' ' 4 'think' 5 ',' 6 ' ' 7 '\'' 8 'don\'t' 9 '\'' 10 ' ' 11 'isn\'t' 12 ' ' 13 'a' 14 ' ' 15 'word' 16 '.' 17 '.' 18 '.' DB<29> x grep { not /^\W|\s+$/ } @list 0 'I' 1 'don\'t' 2 'think' 3 'don\'t' 4 'isn\'t' 5 'a' 6 'word' DB<30>
FWIW grep { ! /^\W+$/ } yield the same result, but I'm not convinced the example is already covering all edge cases...
FWIW
"Francis' car" is an example for what would still fail. The apostrophe will not be part of the first word after splitting. Admittedly a tough problem.
Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery
In reply to Re^2: perlretut - Perl regular expressions tutorial curveball
by LanX
in thread perlretut - Perl regular expressions tutorial curveball
by Cow1337killr
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |