Re^2: Cannot decode string with wide charactersCannot decode string with wide characters

When you convert from UTF-8 (or UTF-16 or any other 'encoding') to Unicode, you are decoding. When you convert from Unicode to UTF-8, you are encoding. Unicode is an integer like 8634

Same, but cleaned up a bit:

When you convert from UTF-8 (or UTF-16 or any other 'encoding') to Unicode, you are decoding. When you convert from Unicode to UTF-8, you are encoding. A Unicode string consists of code points, integers like 8634.

UTF-16 is very easy to parse. Whatever is reading the string just blindly reads two byte chunks (16 bits) from the string, and whatever is in those two bytes is a Unicode integer. However, UTF-8 is a tricky encoding. It use from 1-4 bytes to store a Unicode integer.

UTF-16le and UTF-16be are variable-length encodings just like UTF-8. There are 0x110000 Unicode code points (though most aren't assigned), and that doesn't fit in 16-bits. A UTF-16 code point can take 2 or 4 bytes. For example, the UTF-16be encoding of U+10000 is bytes D8 00 DC 00.

UCS-2le and UCS-2be are fixed-width encodings, but they can only encode a subset of Unicode (code points zero to 0xFFFF).

Comment on Re^2: Cannot decode string with wide charactersCannot decode string with wide characters - I'm not decoding! Download Code