in reply to How to encode for non-unicode output

I have some image files which have unicode caption.

Unicode is not a character encoding. If ExifTool doesn't decode the strings for you, you have to do it yourself. And you have to know its encoding first. There's no way around that.

However, I don't know how to output them to HTML file in their native encoding.

In which "native encoding"? That of the HTML files? which encoding is that?

Let me get this straight: When you want to change the encoding of something, Encode (or the IO layers) are they way to go, but you have to know both the source and destination encoding.

Also make sure to always test with reliable tools and as soon as possible. hexdump in conjunction with an encoding table is reliable. Browsers (that often try to guess an encoding, and sometimes fail) are not.

Replies are listed 'Best First'.
Re^2: How to encode for non-unicode output
by Anonymous Monk on Nov 05, 2008 at 11:31 UTC

      decode("Guess", $text) worked. Since I did not specify the suspect, trial-error leads to UTF-8.

      Since the un-encoded output looks fine as UTF-8, the original text is probably UTF-8 or ExifTool decoded it. But somehow perl does not know it when it tries to encode. Does the decode call just tell perl it's UTF-8?

Re^2: How to encode for non-unicode output
by cheerful (Initiate) on Nov 05, 2008 at 14:16 UTC
    ExifTool use UTF-8 as default. If I print it out w/o encoding, the text is correct with charset set to UTF-8. So the decoding is done, or the source is UTF-8.
      If the source is UTF-8, most string operations (like encoding into a specified character encoding) behaves very differently in the two cases (decoded or not decoded).

      If it's indeed decoded, encode($destination_encoding, $string) will work (but you still need to know in which encoding you want to store it).