in reply to Unicode Pack/Unpack Woes

If you do not "use utf8;", then it is predictable that your line my @bytes = unpack("U*",$input); end up with bytes but not chars splitting.

I advice you to write "use utf8;" at the very start of program but sometimes use local scopes with "no utf8;" when needed.

Also, you really do not need to re-implement utf8<->code-points character transformation. It is done simplier:

use utf8; use Encode; my $utf8str = decode("ucs-2",$input); # or "euc-jp", depends on $input
See "perldoc Encode" to see what I mean, it contains a lot of answers to your questions!

Courage, the Cowardly Dog

Replies are listed 'Best First'.
Re: Re: Unicode Pack/Unpack Woes
by The Ninja K (Novice) on Jan 12, 2003 at 08:45 UTC
    Thanks for the insight into things courage.

    While I still think something is amiss, and so I'll post again, use Encode did solve my problems; however, here's what I don't get.
    $s2u = Text::Iconv->new("sjis", "utf-8"); $x = $s2u->convert($str); $x = decode("utf-8",$x); #Causes Perl to correctly treat the string as + unicode because utf-8 flag is on [oi].

    I should not have to decode something that should already be in utf-8 format, no?
    but anyways, thanks to that this works and I can move on.
    But something still doesn't seem right...