in reply to Re^6: Malformed UTF-8
in thread Malformed UTF-8
It appears $term is not actually UTF-8 encoded when this occurs.
No, it IS utf-8 encoded, perl just doesn't know that it is. And that can cause all kinds of crap. If you're reading $term from a handle (or reading any string from an encoded handle), you should set the handle's encoding using binmode. (i.e. binmode HANDLE,":utf8";) before reading from it. Or you can specify the :utf8 layer when you open() the file.
About the [UTF8 "ba\x{f1}o"] - note that \x{f1} does NOT specify an encoding. It's the literal notation for the 241st letter of the unicode set (which is also the 241st letter of the latin-1 set, i.e. "ñ" eq "\x{f1}") with the advantage that it's 7-bit ASCII so it will print correctly (almost) everywhere no matter if your output expects utf-8, latin-1 or latin-15 etc.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: Malformed UTF-8
by spiros (Beadle) on May 15, 2007 at 17:51 UTC |