To me, there are two important differences between UCS-2 and UTF-16.
The first important difference is that UCS-2 can only represent U+0000 to U+FFFF, whereas UTF-16 can represent any UNICODE character.
The second important difference is the number of bytes UCS-2 and UTF-16 use to store a character. Each UCS-2 character is exactly 16 bits in size, whereas UTF-16 is like UTF-8. Some characters require more than one word.
for output, byte order is determined by the cpu
No. I'm on an x86 (LE machine), but UTF-16be was used.
for input, byte order is determined by a stream-initial BOM (if the BOM isn't there, perl complains about it; if it is there, perl does not remove it for you).
No. Perl *does* remove it for you, just like it adds it for you for output.
In reply to Re^3: CSV nightmare
by ikegami
in thread CSV nightmare
by lorenzov
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |