Re^11: Seeking Perl docs about how UTF8 flag propagates (Terminology)

> For example, \w only works when applied to a string of decoded text.

I think it's better to say that \w defaults to ASCII.

So if the encoded text is ASCII it'll "work".

If it's Latin-1, \w won't match the extra alphanumerics.

Cheers Rolf
_{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)

Wikisyntax for the Monastery}

Comment on Re^11: Seeking Perl docs about how UTF8 flag propagates (Terminology)

Replies are listed 'Best First'.
Re^12: Seeking Perl docs about how UTF8 flag propagates (Terminology) by ikegami (Patriarch) on May 22, 2023 at 16:56 UTC
That's wrong. It doesn't "default to ASCII". It works against decoded text aka string of Unicode Code Points. Always. This can be demonstrated using `"\N{U+100}" =~ /\w/` (which matches). You need to use `/a` if to limit it to the ASCII range. Text encoded using ASCII happens to work because `$x eq encode( "US-ASCII", $x )`. Text encoded using iso-latin-1 happens to work because `$x eq encode( "iso-latin-1", $x )` (though do see last paragraph). Those are just side effects of `\w` working on decoded text. There was a bug where `\w` didn't work for characters in U+0080..U+00FF sometimes. This was fixed 12 years ago in 2011. Add `use v5.14;` to get the fix.	[reply] [d/l] [select]
Re^13: Seeking Perl docs about how UTF8 flag propagates (Terminology) by LanX (Saint) on May 22, 2023 at 20:50 UTC
> There was a bug where \w didn't work for characters in U+0080..U+00FF sometimes. This was fixed 12 years ago in 2011. Add use v5.14; to get the fix. I call default anything without pragmas. But we can agree that the default is buggy. Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^14: Seeking Perl docs about how UTF8 flag propagates (Terminology) by ikegami (Patriarch) on May 23, 2023 at 01:27 UTC
It doesn't "default to ASCII". It works against decoded text aka string of Unicode Code Points. Always. Even without pragmas. This can be demonstrated using `"\N{U+100}" =~ /\w/` (which matches). You need to use `/a` if to limit it to the ASCII range.	[reply] [d/l] [select]
Re^15: Seeking Perl docs about how UTF8 flag propagates (Terminology) by LanX (Saint) on May 23, 2023 at 10:18 UTC
Re^16: Seeking Perl docs about how UTF8 flag propagates (Terminology) by ikegami (Patriarch) on May 23, 2023 at 14:40 UTC


Welcome to the Monastery
	PerlMonks