As best as I can tell, a code point is a index into a character set. I don't see how that relates [unless you're saying something is a code point if it's internally encoded using one encoding (UTF-8), but somehow it's not if it's internally encoded using another (iso-latin-1)]. All I have is a packed byte with no association to any character set.
unpack 'C', substr("\xA0$s", 0, 1) doesn't give 0xA0 in 5.8.8, and that's a bug.
have always been about how the bytes are encoded into memory
Of course. You're saying it also matters how those encoded bytes are encoded, and I disagree with that.
unpack "H*", pack "U", 0x1234 results under 5.10 are mostly nonsensical, not "fixed".
I forgot to address this earlier.
I can see a case for unpack 'H*', $characters_higher_than_255 returning something more sensible. Same idea as allowing characters above 255 for 'C': It doesn't hurt anything. It even aids backwards compatibility.
In reply to Re^8: Parsing UTF-16LE CSV Records Using Text::CSV* (5.10)
by ikegami
in thread Parsing UTF-16LE CSV Records Using Text::CSV*
by Jim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |