in reply to Re^3: Encoding problem
in thread Encoding problem

Thank you for the prompt response, Ikegami. A couple of very helpful insights there, which I shall attempt to make use of as soon as possible.

BTW, I am pretty confident that the entire file has been double encoded. I sure hope, anyway, that that is as bad as it gets... :-)

Replies are listed 'Best First'.
Re^5: Encoding problem
by ikegami (Patriarch) on May 08, 2009 at 20:08 UTC

    I am pretty confident that the entire file has been double encoded.

    Due to a special relationship between iso-latin-1 and UTF-8, it's not really possible to double-encode.

    • encode('iso-latin-1', encode('UTF-8', $text))
      produces the same output as
      encode('UTF-8', $text)
    • For the characters where it works,

      encode('UTF-8', encode('iso-latin-1', $text))
      produces the same output as
      encode('UTF-8', $text)

      For the characters where it doesn't, you'll get a question mark ("?").

    • encode('iso-latin-1', encode('iso-latin-1', $text))
      produces the same output as
      encode('iso-latin-1', $text)
    • The only combination where double-encoding is possible when using those encodings is

      encode('UTF-8', encode('UTF-8', $text))