in reply to Re^5: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching
That's not strange. You're seeing Unicode codepoints, which for the characters in question happen to be identical to their ISO-8859-1 encodings. Add "\N{EURO SIGN}" to the string and you get "\x{20ac}": That's again the codepoint and no UTF-8 encoding.
"Everything is UTF-8" is one of the most frequent false assumptions I encounter when dealing with non-ASCII characters.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: UTF8 versus \w in pattern matching (basic test)
by jo37 (Curate) on Jul 06, 2021 at 18:03 UTC |