in reply to Re^3: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
in thread Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?

You need to decode() to the internal text format.
I realize I should have commented on the code. That's what HTTP::Response->decode_content() does. It determines the character set from the Content-Type header (though I think it does not take <meta http-equiv="content-type" content="text/html; charset=XXX> into account, but that's another story. So I decode it to the internal text format and then convert that into utf8, because that's what HTML::Parser->parse() expects in utf_mode(1), unless I still misunderstand the whole story :-(

That also means you still need to know what the original encoding is and Encode::decode() needs to support that format.
Yes, this is why I die() if $decode is undef. HTTP::Response->decoded_content() is a wrapper around Encode::decode().

Thanks for the reply!!

-- tel
  • Comment on Re^4: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
  • Select or Download Code