bluewhale has asked for the wisdom of the Perl Monks concerning the following question:

This script connects to MSSQL database, reads data and populates it in a pop-up window. Everything is working fine except that the special characters (like chinese/japanese language characters) are NOT getting displayed properly (?? are getting displayed instead). In other words, the encoding is getting lost. The special characters in the database looks fine. The html pop-up window are UTF-8 encoded. So, its the perl script thats causing the issue. Thanks in advance!! @items have the data from the database.
my $count=0; foreach my $row (@items) { my $id = @$row[0]; my $name = @$row[1]; $logger->debug("INITIAL ID: $id"); $logger->debug("INITIAL Name: $name"); eval {Encode::from_to($name, 'utf-16le', 'utf-8', Encode::FB_CROA +K);}; #eval {Encode::from_to($name, 'ucs-2', 'utf-8', Encode::FB_CROAK) +;}; if ($@) { my $error_message = qq(Can Not Encode to UTF-8: $@); $logger->debug("$error_message"); # Log out $error_messag +e } print <<"END"; $name END $count=$count+1; }

Replies are listed 'Best First'.
Re: Encoding problem in perl
by ikegami (Patriarch) on Jul 30, 2009 at 21:14 UTC

    Your test isn't runnable, so I had to adapt it a bit

    #!/usr/bin/perl use strict; use warnings; use Encode qw( encode from_to ); print("Content-Type: text/html; charset=utf-8\n\n"); my $name = encode('utf-16le', "\x{65E5}\x{672C}\x{8A9E}"); eval { from_to($name, 'utf-16le', 'utf-8', Encode::FB_CROAK); 1 } or print("coaked\n"); print $name;

    It works fine. It displays "Japanese (language)" in Japanese.

    From the prompt, I get:

    $ test.cgi | od -c 0000000 C o n t e n t - T y p e : t +e 0000020 x t / h t m l ; c h a r s e +t 0000040 = u t f - 8 \n \n 346 227 245 346 234 254 350 25 +2 0000060 236 0000061

    Find what differs, and you'll find which bad assumption you made.

    By the way, the following is a better model:

    #!/usr/bin/perl use strict; use warnings; use Encode qw( encode decode ); binmode(STDOUT, ':encoding(utf-8)'); print("Content-Type: text/html; charset=utf-8\n\n"); # Input my $name = encode('utf-16le', "\x{65E5}\x{672C}\x{8A9E}"); eval { $name = decode('utf-16le', $name, Encode::FB_CROAK); 1 } or print("coaked\n"); # Process # ... # Output print $name;
Re: Encoding problem in perl
by moritz (Cardinal) on Jul 30, 2009 at 21:01 UTC
    Is the if ($@) condition ever triggered? If yes, I guess that the string is already decoded somehow, and must only be encode()d; or it might be in a different encoding than you think it is.

    A good way to debug this is to enter a single non-ASCII character into the DB, let's say a U+260E BLACK TELEPHONE, or ☎.

    Then you can inspect your strings with

    use Data::Dumper; $Data::Dumper::Useqq = 1; $logger->debug(Dumper $string);

    As UTF-16LE that's encoded as 0e 26, as UTF-8 it's encoded as e2 98 8e

    Armed with that knowledge you check where the character encoding starts to diverge from your expectations.