Re^3: Encoding problem

It depends whether the data is double encoded, or whether you different encodings are used for different parts of the file. Thus my request for a sample of the file. I suspect the latter.

Using :encoding twice (assuming it works at all) would only help the former case. The order for decoding would be the opposite order used for encoding.

The latter case would involve looking at each byte or group of bytes and making guesses.

PS — Don't use UTF8 (an encoding known only to Perl) when decoding. That leaves you open to a vulnerability. Use UTF-8 instead.

Update: Using :encoding twice doesn't always work if ever. You'll need to use decode($enc1, decode($enc2, $_)) if your text is double-encoded.

Comment on Re^3: Encoding problem Select or Download Code

Replies are listed 'Best First'.
Re^4: Encoding problem by grscott (Novice) on May 08, 2009 at 19:34 UTC
Thank you for the prompt response, Ikegami. A couple of very helpful insights there, which I shall attempt to make use of as soon as possible. BTW, I am pretty confident that the entire file has been double encoded. I sure hope, anyway, that that is as bad as it gets... :-)	[reply]
Re^5: Encoding problem by ikegami (Patriarch) on May 08, 2009 at 20:08 UTC
I am pretty confident that the entire file has been double encoded. Due to a special relationship between iso-latin-1 and UTF-8, it's not really possible to double-encode. `encode('iso-latin-1', encode('UTF-8', $text))` [download] produces the same output as `encode('UTF-8', $text)` [download] For the characters where it works, `encode('UTF-8', encode('iso-latin-1', $text))` [download] produces the same output as `encode('UTF-8', $text)` [download] For the characters where it doesn't, you'll get a question mark ("?"). `encode('iso-latin-1', encode('iso-latin-1', $text))` [download] produces the same output as `encode('iso-latin-1', $text)` [download] The only combination where double-encoding is possible when using those encodings is `encode('UTF-8', encode('UTF-8', $text))` [download]	[reply] [d/l] [select]