in reply to Re^3: What does Encode::encode_utf8 do to UTF-8 data ?
in thread What does Encode::encode_utf8 do to UTF-8 data ?
Hmm, this doesn't make sense to me: AFAIK Perl strings never store code points, but rather store the UTF-8 encoding of the code points e.g. the string with a Greek uppercase Kappa, whose code point is 039A:
does not contain, in hex, 039A, but rather in hex, CE9A, the UTF8 encoding of that code point.$str = "\x{039A}";
>And the function returned something that was different from the original string, as can be seem below:
What your example seems to demonstrate, AFAICS, is the character v. byte o/p of length, when presented with strings where the UTF-8 flag is switched on/off.
So for the final string, containing alpha, beta, gamma, and delta, it has a length of 4 characters, when Perl knows that it contains valid UTF-8, but a length of 8 when Perl is assuming the old byte=character semantics. However, both the strings are byte-for-byte identical.
Or, if I'm wrong here, I'm very confused.
Steve Collyer
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: What does Encode::encode_utf8 do to UTF-8 data ?
by dave_the_m (Monsignor) on Oct 03, 2005 at 14:56 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 16:08 UTC | |
by dave_the_m (Monsignor) on Oct 03, 2005 at 18:54 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 20:04 UTC |