in reply to Re^6: Malformed UTF-8
in thread Malformed UTF-8

It appears $term is not actually UTF-8 encoded when this occurs.

No, it IS utf-8 encoded, perl just doesn't know that it is. And that can cause all kinds of crap. If you're reading $term from a handle (or reading any string from an encoded handle), you should set the handle's encoding using binmode. (i.e. binmode HANDLE,":utf8";) before reading from it. Or you can specify the :utf8 layer when you open() the file.

About the [UTF8 "ba\x{f1}o"] - note that \x{f1} does NOT specify an encoding. It's the literal notation for the 241st letter of the unicode set (which is also the 241st letter of the latin-1 set, i.e. "ñ" eq "\x{f1}") with the advantage that it's 7-bit ASCII so it will print correctly (almost) everywhere no matter if your output expects utf-8, latin-1 or latin-15 etc.

Replies are listed 'Best First'.
Re^8: Malformed UTF-8
by spiros (Beadle) on May 15, 2007 at 17:51 UTC
    Thank you very much. This might indeed be the root of the problem. I will have a closer look.