in reply to Re^7: Parsing UTF-16LE CSV Records Using Text::CSV* (5.10)
in thread Parsing UTF-16LE CSV Records Using Text::CSV*
As best as I can tell, a code point is a index into a character set. I don't see how that relates [unless you're saying something is a code point if it's internally encoded using one encoding (UTF-8), but somehow it's not if it's internally encoded using another (iso-latin-1)]. All I have is a packed byte with no association to any character set.
unpack 'C', substr("\xA0$s", 0, 1) doesn't give 0xA0 in 5.8.8, and that's a bug.
have always been about how the bytes are encoded into memory
Of course. You're saying it also matters how those encoded bytes are encoded, and I disagree with that.
unpack "H*", pack "U", 0x1234 results under 5.10 are mostly nonsensical, not "fixed".
I forgot to address this earlier.
I can see a case for unpack 'H*', $characters_higher_than_255 returning something more sensible. Same idea as allowing characters above 255 for 'C': It doesn't hurt anything. It even aids backwards compatibility.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^9: Parsing UTF-16LE CSV Records Using Text::CSV* (5.10)
by tye (Sage) on Jul 20, 2009 at 20:09 UTC | |
by ikegami (Patriarch) on Jul 20, 2009 at 20:34 UTC |