in reply to Re^11: Seeking Perl docs about how UTF8 flag propagates (Terminology)
in thread Seeking Perl docs about how UTF8 flag propagates
That's wrong.
It doesn't "default to ASCII". It works against decoded text aka string of Unicode Code Points. Always. This can be demonstrated using "\N{U+100}" =~ /\w/ (which matches). You need to use /a if to limit it to the ASCII range.
Text encoded using ASCII happens to work because $x eq encode( "US-ASCII", $x ).
Text encoded using iso-latin-1 happens to work because $x eq encode( "iso-latin-1", $x ) (though do see last paragraph).
Those are just side effects of \w working on decoded text.
There was a bug where \w didn't work for characters in U+0080..U+00FF sometimes. This was fixed 12 years ago in 2011. Add use v5.14; to get the fix.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^13: Seeking Perl docs about how UTF8 flag propagates (Terminology)
by LanX (Saint) on May 22, 2023 at 20:50 UTC | |
by ikegami (Patriarch) on May 23, 2023 at 01:27 UTC | |
by LanX (Saint) on May 23, 2023 at 10:18 UTC | |
by ikegami (Patriarch) on May 23, 2023 at 14:40 UTC |