in reply to perlretut - Perl regular expressions tutorial curveball
But a single quote "'" is not an alphanumeric character² in the class \w, but in the opposing \W!
Hence in the example only "don" would be matched. ¹
(Actually, to be more precise "don't" should be written with an apostrophe not a quote, but yeah computers you know ;)
5.22 introduced \b{wb} to "process" characters which appear inside words of "natural languages" like English according to Unicode rules.
Now "don't" can be matched.
Hope it's clearer now.
perlretut is a tutorial, you need to lookup details in perlre which is the central reference. After searching "wb" there you'll be delegated to perlrebackslash , which is detailing backslash sequences
In this case a discussion is found in \b{}, \b, \B{}, \B
With explicit details in
> To get better word matching of natural language text, see "\b{wb}" below.
Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery
¹)
DB<35> x "don't" =~ / (.+?) (\b) /x 0 'don' 1 '' DB<36> x "don't" =~ / (.+?) (\b) /xg 0 'don' 1 '' # boundaries always empty 2 '\'' 3 '' 4 't' 5 ''
²) actually this is also more complicated ...
\w [3] Match a "word" character (alphanumeric plus "_", plus other connector punctuation chars plus Unicode marks)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: perlretut - Perl regular expressions tutorial curveball
by afoken (Chancellor) on Apr 11, 2025 at 08:43 UTC | |
Re^2: perlretut - Perl regular expressions tutorial curveball
by LanX (Saint) on Apr 11, 2025 at 12:49 UTC |