in reply to Re^2: CSV nightmare
in thread CSV nightmare
To me, there are two important differences between UCS-2 and UTF-16.
The first important difference is that UCS-2 can only represent U+0000 to U+FFFF, whereas UTF-16 can represent any UNICODE character.
The second important difference is the number of bytes UCS-2 and UTF-16 use to store a character. Each UCS-2 character is exactly 16 bits in size, whereas UTF-16 is like UTF-8. Some characters require more than one word.
for output, byte order is determined by the cpu
No. I'm on an x86 (LE machine), but UTF-16be was used.
for input, byte order is determined by a stream-initial BOM (if the BOM isn't there, perl complains about it; if it is there, perl does not remove it for you).
No. Perl *does* remove it for you, just like it adds it for you for output.
|
|---|