in reply to Re^2: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
in thread Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?

encode_utf8 does not do what you think it does. You need to decode() to the internal text format. That also means you still need to know what the original encoding is and Encode::decode() needs to support that format.

update: very compactly:

Encode::encode() etc translate text strings in perl into binary strings in some external encoding.

Encode::decode() and friends translate binary strings from some encoding into perl text strings.

  • Comment on Re^3: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?

Replies are listed 'Best First'.
Re^4: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
by telcontar (Beadle) on Aug 09, 2007 at 04:17 UTC
    You need to decode() to the internal text format.
    I realize I should have commented on the code. That's what HTTP::Response->decode_content() does. It determines the character set from the Content-Type header (though I think it does not take <meta http-equiv="content-type" content="text/html; charset=XXX> into account, but that's another story. So I decode it to the internal text format and then convert that into utf8, because that's what HTML::Parser->parse() expects in utf_mode(1), unless I still misunderstand the whole story :-(

    That also means you still need to know what the original encoding is and Encode::decode() needs to support that format.
    Yes, this is why I die() if $decode is undef. HTTP::Response->decoded_content() is a wrapper around Encode::decode().

    Thanks for the reply!!

    -- tel