Re: Unicode Help
by halley (Prior) on Mar 17, 2004 at 17:22 UTC
|
Unicode is a map of numbers vs characters. You refer to 16-bit encodings for characters, which is not the same thing as the general concept as Unicode. It could be any number of different encoding schemes.
This doesn't answer your question directly, but you might want to read up on my FMTYEWTK about Characters vs Bytes node to get a better understanding of how to think about encoding.
-- [ e d @ h a l l e y . c c ]
| [reply] |
|
|
In the linked node there is nothing about UTF-16 which the OP apparently needs to use. Here is a page in wikipedia about it: UTF-16.
| [reply] |
|
|
I think my point is that it's still just a guess that the application needs UTF-16. Microsoft uses a non-Unicode "DBCS" character set which is not the same as UTF-16, but would look very similar for many simple samples. Assumptions are dangerous.
-- [ e d @ h a l l e y . c c ]
| [reply] |
|
|
|
|
Hey, thanks for the link. My issue isn't really with unicode I think, but trying to encode the string I want to send into a 16bit format. Is there an easy way to do this?
| [reply] |
|
|
You'll need to find out what *encoding* they really want, and then target that *encoding*.
You have two choices: look at some examples and decide something arbitrarily, like "the first byte is ASCII, the second byte is zero," or you can actually find out what the application is expecting. The former can get you running, the latter will avoid sticky problems when a message must include non-ASCII characters like u-with-umlauts or capital-sigma or elvish-parma.
If you find the actual *encoding* standard they expect, you'll probably find a Perl module that will help you encode to that scheme without much fuss. It would be a rare standard that forced you to encode things yourself, but the perl builtin functions pack and unpack are a good start to your solution.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
Re: Unicode Help
by Avox (Sexton) on Mar 17, 2004 at 18:49 UTC
|
Well, with the help of a bit of network sniffing, I've been comparing the message I'm sending to the one a working client is sending. I've started using the encoding UTF-16 and the both messages are identical except for the first 4 bytes at the begining and the last byte at the end of the "working message".
working message begins with: FF FE FF c3 3C
not working message begins with: FE FF 00 3C
the end of the non working message just needs a 00 tacked on.
Does this mean anything to anyone? | [reply] |
|
|
The 0xFFFE and 0xFEFF are the byte order marks. They are used to determine the byte order with the UTF-16 encodings. You need to ask which byte order of UTF-16 that the server is using. I am guessing that are using UTF-16LE, since that uses 0xFFFE. Using the correct ordering may make the "extra" bytes go away.
| [reply] |
|
|
This does not explain all of your symptomes but it can lead you to some better understanding. From the UTF-16 page on wikipedia:
The UTF-16 encoding scheme mandates that the byte order must be declared by prepending a Byte Order Mark before the first serialized character. This BOM is the encoded version of the Zero-Width No-Break Space character, Unicode number FEFF in hex, manifesting as the byte sequence FE FF for big-endian, or FF FE for little-endian. A BOM at the beginning of UTF-16 encoded data is considered to be a signature separate from the text itself; it is for the benefit of the decoder.
The UTF-16LE and UTF-16BE encoding schemes are identical to the UTF-16 encoding schemes, but rather than using a BOM, the byte order is implicit in the name of the encoding (LE for little-endian, BE for big-endian). A BOM at the beginning of UTF-16LE or UTF-16BE encoded data is not considered to be a BOM; it is part of the text itself.
| [reply] |
Re: Unicode Help
by Avox (Sexton) on Mar 17, 2004 at 19:15 UTC
|
Seriously people, thanks alot for all the help. I'll give the little endian stuff a try and report back.
| [reply] |
|
|
Well, due to deadlines, I wasn't able to pursue the pure perl version of this as I'd hoped. I ended up writing a little MFC command line app to send the message. Since I compiled it using the MS unicode, it all works. I just execute the command line app via perl now. When I get some time, i still hope to go back and figure this out. Thanks everyone...
| [reply] |