You need to decode() to the internal text format.I realize I should have commented on the code. That's what HTTP::Response->decode_content() does. It determines the character set from the Content-Type header (though I think it does not take <meta http-equiv="content-type" content="text/html; charset=XXX> into account, but that's another story. So I decode it to the internal text format and then convert that into utf8, because that's what HTML::Parser->parse() expects in utf_mode(1), unless I still misunderstand the whole story :-(
That also means you still need to know what the original encoding is and Encode::decode() needs to support that format.Yes, this is why I die() if $decode is undef. HTTP::Response->decoded_content() is a wrapper around Encode::decode().
In reply to Re^4: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
by telcontar
in thread Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
by telcontar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |