in reply to Encoding Problem

Check the respective code page listings, which you can find here: http://www.unicode.org/Public/MAPPINGS/ISO8859/ and here: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/. It turns out that cp1254 and 8859-9 are the same set of characters -- the only difference is that all the cp12* pages cram stuff into the 0x80-0x9f range, where the 8859-* pages just have "control characters" (effectively nothing useful).

So if you use cp1254 for everything that isn't utf8, you should be fine -- the 8859-9 data will be using a subset of the characters defined by the cp1254 table. (And it's easy to tell whether something is utf8 or not: try to decode it as if it were utf8, and if that fails, you know it isn't utf8.)

Replies are listed 'Best First'.
Re^2: Encoding Problem
by anlamarama (Acolyte) on Nov 13, 2009 at 03:57 UTC

    Thanks, I did not know that.

    However, I tried to decode cp1254 encoded data with iso-8859-9, it gave me garbled text. I have tried it again, and it works, you are right. I mixed encodings or files when trying I guess.(lots of files and encodings etc) Sorry for that really.

    So, I guess I should have not blamed Encode::Guess as well. :)

    Thanks again,

      I tried to decode cp1254 encoded data with iso-8859-9, it gave me garbled text.

      If you had been asking for errors or warnings from Encode, it would have given you those as well.

      Make sure you understand the "superset/subset" relation: cp1254 is a superset of 8859-9 (8859-9 is a subset of cp1254), which means that treating cp1254 data as if it were 8859-9 data is likely to fail, whereas treating 8859-9 data as if it were cp1254 will not fail.

      And yes, Encode::Guess was apparently doing the right thing and giving you the correct answer, if the text you gave it happened to actually be 8859-9 (because such text could also be cp1254). But if you gave it single-byte-per-character text that included a lot of bytes in the 0x80-0x9f range, and it said "this could be 8859-9", I would call that a disappointing mistake.