Why?! Before decoding the utf8 string, how could the string go from input to output unchanged but fail to match the regex?Basically, that's because Perl by default thinks that a binary string is in Latin-1, rather then UTF-8. And that's a problem - every string in any encoding (UTF-8 or anything else) is valid Latin-1.
Charater \xb5 is one byte in Latin-1, but two bytes in UTF-8. And \x3bc is just too big for a one byte encoding.
Why do I need to decode the utf8 string to match an utf8 characterIf you have some string in UTF-8, and want to apply regexes to it, or get it's length in characters, etc... you always have to do that. Because backwards compatibility. Perl is old. Other languages (Python, Ruby) broke compatiblity to get better Unicode. Perl didn't.
In reply to Re: Matching/replacing a unicode character only works after decode()
by Anonymous Monk
in thread Matching/replacing a unicode character only works after decode()
by FloydATC
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |