in reply to Unicode Help

Unicode is a map of numbers vs characters. You refer to 16-bit encodings for characters, which is not the same thing as the general concept as Unicode. It could be any number of different encoding schemes.

This doesn't answer your question directly, but you might want to read up on my FMTYEWTK about Characters vs Bytes node to get a better understanding of how to think about encoding.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re: Re: Unicode Help
by zby (Vicar) on Mar 17, 2004 at 17:33 UTC
    In the linked node there is nothing about UTF-16 which the OP apparently needs to use. Here is a page in wikipedia about it: UTF-16.
      I think my point is that it's still just a guess that the application needs UTF-16. Microsoft uses a non-Unicode "DBCS" character set which is not the same as UTF-16, but would look very similar for many simple samples. Assumptions are dangerous.

      --
      [ e d @ h a l l e y . c c ]

        Windows uses UCS-2. It is Unicode and standard. It is equivalent to UTF-16 for all characters in the Basic Code Plane, all the characters representable in 16-bits. For characters greater than U+FFFF, UTF-16 uses surrogate pairs to encode them in 32-bits.
Re: Re: Unicode Help
by Avox (Sexton) on Mar 17, 2004 at 17:29 UTC
    Hey, thanks for the link. My issue isn't really with unicode I think, but trying to encode the string I want to send into a 16bit format. Is there an easy way to do this?
      You'll need to find out what *encoding* they really want, and then target that *encoding*.

      You have two choices: look at some examples and decide something arbitrarily, like "the first byte is ASCII, the second byte is zero," or you can actually find out what the application is expecting. The former can get you running, the latter will avoid sticky problems when a message must include non-ASCII characters like u-with-umlauts or capital-sigma or elvish-parma.

      If you find the actual *encoding* standard they expect, you'll probably find a Perl module that will help you encode to that scheme without much fuss. It would be a rare standard that forced you to encode things yourself, but the perl builtin functions pack and unpack are a good start to your solution.

      --
      [ e d @ h a l l e y . c c ]