in reply to problem decoding a quoted-printable euc-jp mail header
As for the third example, euc-jp works correctly for that, and changing its label to shiftjis causes it to come out as gibberish.
If I understand the docs I've seen, shiftjis is a system that uses one byte per character for ASCII content, and two bytes per character for Japanese, where the first byte of a Japanese pair always has the eigth bit set, and the second byte may or may not have the eigth bit set.
Meanwhile, euc-jp uses a different strategy to allow ASCII and Japanese content to coexist: ASCII content is again one byte per character, but Japanese content may be two or three bytes per character, and all bytes of a Japanese character must have the eigth bit set.
Since your first two strings appear to be composed of byte pairs with the second byte sometimes in the ASCII range, they cannot be valid euc-jp. They'll display nicely if treated as shiftjis (and then if they make sense to people who know Japanese, so much the better).
As for your finding that they all display correctly in Outlook... euc-jp comes from the unix world, whereas shiftjis is more in the MS-Windows domain. I guess I wouldn't be surprised to learn that MS software somehow manages to ignore an incorrect encoding label.
So where are those labels coming from, anyway? Is someone sending you bad data, or are you creating bad data?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: problem decoding a quoted-printable euc-jp mail header
by blahblahblah (Priest) on Oct 12, 2006 at 13:19 UTC |