in reply to Re^2: What does Encode::encode_utf8 do to UTF-8 data ?
in thread What does Encode::encode_utf8 do to UTF-8 data ?
However, in my example, the contents of the string passed to encode_utf8 were not code points - there were already in UTF8, and were therefore left unaltered by encode_utf8.No. The string passed to encode_utf8 did contain codepoints; that's what Perl strings are. And the function returned something that was different from the original string, as can be seem below:
How the string is internally represented in Perl is (almost always) completely irrelevant. Perl sees strings as a list of codepoints; typically if all the codepoints are < 256, perl stores them using one byte per codepoint; if any are >= 256, it stores them all as a variable number of bytes using (as it happens) utf8 encoding internally.use strict; use warnings; use Encode; use charnames qw(greek); for ("ABCD", "ABC\N{delta}", "\N{alpha}\N{beta}\N{gamma}\N{delta}") { printf "orig len=%d, enc len=%d\n", length($_), length(Encode::encode_utf8($_)); } __END__ $ perl /tmp/p orig len=4, enc len=4 orig len=4, enc len=5 orig len=4, enc len=8
Regardless of a string's internal coding, Encode::encode_utf8() returns a string consisting of a codepoint for each the octets of what would be the utf8 representation of the original string, ragardless of how that original string was actually stored internally.
Dave.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: What does Encode::encode_utf8 do to UTF-8 data ?
by scollyer (Sexton) on Oct 03, 2005 at 13:59 UTC | |
by dave_the_m (Monsignor) on Oct 03, 2005 at 14:56 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 16:08 UTC | |
by dave_the_m (Monsignor) on Oct 03, 2005 at 18:54 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 20:04 UTC |