Re: Unicode Help

Well, with the help of a bit of network sniffing, I've been comparing the message I'm sending to the one a working client is sending. I've started using the encoding UTF-16 and the both messages are identical except for the first 4 bytes at the begining and the last byte at the end of the "working message".

working message begins with: FF FE FF c3 3C not working message begins with: FE FF 00 3C

the end of the non working message just needs a 00 tacked on.

Does this mean anything to anyone?

Comment on Re: Unicode Help

Replies are listed 'Best First'.
Re: Re: Unicode Help by iburrell (Chaplain) on Mar 17, 2004 at 19:09 UTC
The 0xFFFE and 0xFEFF are the byte order marks. They are used to determine the byte order with the UTF-16 encodings. You need to ask which byte order of UTF-16 that the server is using. I am guessing that are using UTF-16LE, since that uses 0xFFFE. Using the correct ordering may make the "extra" bytes go away.	[reply]
Re: Re: Unicode Help by zby (Vicar) on Mar 17, 2004 at 19:10 UTC
This does not explain all of your symptomes but it can lead you to some better understanding. From the UTF-16 page on wikipedia: The UTF-16 encoding scheme mandates that the byte order must be declared by prepending a Byte Order Mark before the first serialized character. This BOM is the encoded version of the Zero-Width No-Break Space character, Unicode number FEFF in hex, manifesting as the byte sequence FE FF for big-endian, or FF FE for little-endian. A BOM at the beginning of UTF-16 encoded data is considered to be a signature separate from the text itself; it is for the benefit of the decoder. The UTF-16LE and UTF-16BE encoding schemes are identical to the UTF-16 encoding schemes, but rather than using a BOM, the byte order is implicit in the name of the encoding (LE for little-endian, BE for big-endian). A BOM at the beginning of UTF-16LE or UTF-16BE encoded data is not considered to be a BOM; it is part of the text itself.	[reply]