Re: Unicode Pack/Unpack Woes

If you do not "use utf8;", then it is predictable that your line my @bytes = unpack("U*",$input); end up with bytes but not chars splitting.

I advice you to write "use utf8;" at the very start of program but sometimes use local scopes with "no utf8;" when needed.

Also, you really do not need to re-implement utf8<->code-points character transformation. It is done simplier:

use utf8;
use Encode;
my $utf8str = decode("ucs-2",$input); # or "euc-jp", depends on $input
[download]

See "perldoc Encode" to see what I mean, it contains a lot of answers to your questions!

Courage, the Cowardly Dog

Comment on Re: Unicode Pack/Unpack Woes Select or Download Code

Replies are listed 'Best First'.
Re: Re: Unicode Pack/Unpack Woes by The Ninja K (Novice) on Jan 12, 2003 at 08:45 UTC
Thanks for the insight into things courage. While I still think something is amiss, and so I'll post again, use Encode did solve my problems; however, here's what I don't get. `$s2u = Text::Iconv->new("sjis", "utf-8"); $x = $s2u->convert($str); $x = decode("utf-8",$x); #Causes Perl to correctly treat the string as + unicode because utf-8 flag is on [oi].` [download] I should not have to decode something that should already be in utf-8 format, no? but anyways, thanks to that this works and I can move on. But something still doesn't seem right...	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Unicode Pack/Unpack Woes
by The Ninja K (Novice) on Jan 12, 2003 at 08:45 UTC

$s2u = Text::Iconv->new("sjis", "utf-8");
$x = $s2u->convert($str);
$x = decode("utf-8",$x); #Causes Perl to correctly treat the string as
+ unicode because utf-8 flag is on [oi].
[download]

[reply]
[d/l]