Re: How to encode for non-unicode output

I have some image files which have unicode caption.

Unicode is not a character encoding. If ExifTool doesn't decode the strings for you, you have to do it yourself. And you have to know its encoding first. There's no way around that.

However, I don't know how to output them to HTML file in their native encoding.

In which "native encoding"? That of the HTML files? which encoding is that?

Let me get this straight: When you want to change the encoding of something, Encode (or the IO layers) are they way to go, but you have to know both the source and destination encoding.

Also make sure to always test with reliable tools and as soon as possible. hexdump in conjunction with an encoding table is reliable. Browsers (that often try to guess an encoding, and sometimes fail) are not.

Comment on Re: How to encode for non-unicode output Download Code

Replies are listed 'Best First'.
Re^2: How to encode for non-unicode output by Anonymous Monk on Nov 05, 2008 at 11:31 UTC
Sometimes you can Encode::Guess	[reply]
Re^3: How to encode for non-unicode output by cheerful (Initiate) on Nov 05, 2008 at 14:55 UTC
decode("Guess", $text) worked. Since I did not specify the suspect, trial-error leads to UTF-8. Since the un-encoded output looks fine as UTF-8, the original text is probably UTF-8 or ExifTool decoded it. But somehow perl does not know it when it tries to encode. Does the decode call just tell perl it's UTF-8?	[reply]
Re^4: How to encode for non-unicode output by Anonymous Monk on Nov 05, 2008 at 15:07 UTC
Probably (I'm really tired at moment, don't follow), but check http://search.cpan.org/grep?cpanid=EXIFTOOL&release=Image-ExifTool-7.51&string=utf8&i=1&n=1&C=0 and maybe even The Perl UTF-8 and utf8 Encoding Mess	[reply]
Re^2: How to encode for non-unicode output by cheerful (Initiate) on Nov 05, 2008 at 14:16 UTC
ExifTool use UTF-8 as default. If I print it out w/o encoding, the text is correct with charset set to UTF-8. So the decoding is done, or the source is UTF-8.	[reply]
Re^3: How to encode for non-unicode output by moritz (Cardinal) on Nov 05, 2008 at 15:55 UTC
If the source is UTF-8, most string operations (like encoding into a specified character encoding) behaves very differently in the two cases (decoded or not decoded). If it's indeed decoded, `encode($destination_encoding, $string)` will work (but you still need to know in which encoding you want to store it).	[reply] [d/l]