scollyer has asked for the wisdom of the Perl Monks concerning the following question:
AFAICS it merely removes the UTF-8 flag, as the program below seems to demonstrate.
The Encode documentation says that encode(ENCODING, ..) "Encodes a string from Perl's internal form into ENCODING and returns a sequence of octets"; Now, my understanding is that Perl's internal encoding *is* UTF-8, so that when applied to a string with the UTF-8 flag on, Encode::encode_utf8 is essentially a no-op, and merely switches off the UTF-8 flag.
Am I confused ?
Steve Collyer
################# code follows #######################
#!/usr/bin/perl use strict; use warnings; use Encode; use charnames qw(greek); binmode(STDOUT, ":utf8"); my $utf8_data = "<\N{alpha}\N{beta}\N{gamma}\N{delta}>"; print $utf8_data, "\n\n"; my $enc_utf8_data = Encode::encode_utf8($utf8_data); print Encode::is_utf8($enc_utf8_data) ? "\$enc_utf8_data marked as UTF-8\n\n" : "\$enc_utf8_data not marked as UTF-8\n\n"; print Encode::is_utf8($utf8_data) ? "\$utf8_data marked as UTF-8\n\n" : "\$utf8_data not marked as UTF-8\n\n"; if ($utf8_data eq $enc_utf8_data) { print "strings differ\n"; print "utf8_data ", unpack("H*", $utf8_data), "\n"; print "enc_utf8_data ", unpack("H*", $enc_utf8_data), "\n"; } else { print "strings are the same\n"; print "utf8_data ", unpack("H*", $utf8_data), "\n"; print "enc_utf8_data ", unpack("H*", $enc_utf8_data), "\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: What does Encode::encode_utf8 do to UTF-8 data ?
by dave_the_m (Monsignor) on Oct 03, 2005 at 11:15 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 12:32 UTC | |
by dave_the_m (Monsignor) on Oct 03, 2005 at 13:27 UTC | |
by scollyer (Sexton) on Oct 03, 2005 at 13:59 UTC | |
by dave_the_m (Monsignor) on Oct 03, 2005 at 14:56 UTC | |
| |
by scollyer (Sexton) on Oct 03, 2005 at 13:36 UTC | |
|
Re: What does Encode::encode_utf8 do to UTF-8 data ?
by tphyahoo (Vicar) on Oct 03, 2005 at 10:34 UTC |