How to encode for non-unicode output

cheerful has asked for the wisdom of the Perl Monks concerning the following question:

I have some image files which have unicode caption. I used ExifTool to extract them. However, I don't know how to output them to HTML file in their native encoding.

If I write them out without encoding (all output to STDOUT which is redirected to a file), the output can be viewed if charset is UTF-8. The font is ugly but correct.

However, if I try any of the following, I get garbage

binmode(STDOUT, ":encoding(euc-cn)")

binmode(STDOUT, ":encoding(gb2312)")

or convert each individual string

$text = encode('euc-cn', $text)
[download]

I even tried decode('UTF-8', $text) before but it does not work either. What's the proper way to output in correct encoding/charset? Thanks!

Comment on How to encode for non-unicode output Download Code

Replies are listed 'Best First'.
Re: How to encode for non-unicode output by moritz (Cardinal) on Nov 04, 2008 at 19:36 UTC
I have some image files which have unicode caption. Unicode is not a character encoding. If ExifTool doesn't decode the strings for you, you have to do it yourself. And you have to know its encoding first. There's no way around that. However, I don't know how to output them to HTML file in their native encoding. In which "native encoding"? That of the HTML files? which encoding is that? Let me get this straight: When you want to change the encoding of something, Encode (or the IO layers) are they way to go, but you have to know both the source and destination encoding. Also make sure to always test with reliable tools and as soon as possible. `hexdump` in conjunction with an encoding table is reliable. Browsers (that often try to guess an encoding, and sometimes fail) are not.	[reply] [d/l]
Re^2: How to encode for non-unicode output by Anonymous Monk on Nov 05, 2008 at 11:31 UTC
Sometimes you can Encode::Guess	[reply]
Re^3: How to encode for non-unicode output by cheerful (Initiate) on Nov 05, 2008 at 14:55 UTC
decode("Guess", $text) worked. Since I did not specify the suspect, trial-error leads to UTF-8. Since the un-encoded output looks fine as UTF-8, the original text is probably UTF-8 or ExifTool decoded it. But somehow perl does not know it when it tries to encode. Does the decode call just tell perl it's UTF-8?	[reply]
Re^4: How to encode for non-unicode output by Anonymous Monk on Nov 05, 2008 at 15:07 UTC
Re^2: How to encode for non-unicode output by cheerful (Initiate) on Nov 05, 2008 at 14:16 UTC
ExifTool use UTF-8 as default. If I print it out w/o encoding, the text is correct with charset set to UTF-8. So the decoding is done, or the source is UTF-8.	[reply]
Re^3: How to encode for non-unicode output by moritz (Cardinal) on Nov 05, 2008 at 15:55 UTC
If the source is UTF-8, most string operations (like encoding into a specified character encoding) behaves very differently in the two cases (decoded or not decoded). If it's indeed decoded, `encode($destination_encoding, $string)` will work (but you still need to know in which encoding you want to store it).	[reply] [d/l]