Re^6: UTF8 versus \w in pattern matching (basic test)

That's not strange. You're seeing Unicode codepoints, which for the characters in question happen to be identical to their ISO-8859-1 encodings. Add "\N{EURO SIGN}" to the string and you get "\x{20ac}": That's again the codepoint and no UTF-8 encoding.

"Everything is UTF-8" is one of the most frequent false assumptions I encounter when dealing with non-ASCII characters.

Comment on Re^6: UTF8 versus \w in pattern matching (basic test)

Replies are listed 'Best First'.
Re^7: UTF8 versus \w in pattern matching (basic test) by jo37 (Curate) on Jul 06, 2021 at 18:03 UTC
Thanks for the clarification. Greetings, -jo `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply]