in reply to Dealing with strange data encoding issue

Looks like UTF-16 (aka UCS-2) to me.

See perlunicode

update: I guess I'm still learning ... but that explains why I don't see references to UCS-2 anymore

  • Comment on Re: Dealing with strange data encoding issue

Replies are listed 'Best First'.
Re^2: Dealing with strange data encoding issue
by ikegami (Patriarch) on Feb 20, 2009 at 23:23 UTC

    UTF-16 is not quite the same thing as UCS-2.

    • UCS-2 uses a single 16-bit word per character. It is therefore unable to serialize characters other than U+0000 .. U+FFFF.
    • UTF-16 uses a variable number of 16-bit words per character, and is thus able to access all UNICODE characters.

    Basically, UCS-2 is to UTF-16 as iso-latin-1 is to UTF-8.

    There's not enough info to know whether the encoding used in the OP is UCS-2 or UTF-16. To play it safe, accept UTF-16 and send UCS-2.

    Update: Noteable flaw in the comparison:

    • Characters supported by UCS-2 are encoded identically in UTF-16.
    • Characters may be encoded differently in iso-latin-1 and UTF-8.
Re^2: Dealing with strange data encoding issue
by Zapawork (Scribe) on Feb 20, 2009 at 22:19 UTC
    UTF-16 it is!

    Thank you so much
    Dave -- Saving the world one node at a time