in reply to Re: Cannot decode string with wide charactersCannot decode string with wide characters - I'm not decoding!
in thread Cannot decode string with wide charactersCannot decode string with wide characters - I'm not decoding!
When you convert from UTF-8 (or UTF-16 or any other 'encoding') to Unicode, you are decoding. When you convert from Unicode to UTF-8, you are encoding. Unicode is an integer like 8634
Same, but cleaned up a bit:
When you convert from UTF-8 (or UTF-16 or any other 'encoding') to Unicode, you are decoding. When you convert from Unicode to UTF-8, you are encoding. A Unicode string consists of code points, integers like 8634.
UTF-16 is very easy to parse. Whatever is reading the string just blindly reads two byte chunks (16 bits) from the string, and whatever is in those two bytes is a Unicode integer. However, UTF-8 is a tricky encoding. It use from 1-4 bytes to store a Unicode integer.
UTF-16le and UTF-16be are variable-length encodings just like UTF-8. There are 0x110000 Unicode code points (though most aren't assigned), and that doesn't fit in 16-bits. A UTF-16 code point can take 2 or 4 bytes. For example, the UTF-16be encoding of U+10000 is bytes D8 00 DC 00.
UCS-2le and UCS-2be are fixed-width encodings, but they can only encode a subset of Unicode (code points zero to 0xFFFF).
|
|---|