Encoding problem in perl

bluewhale has asked for the wisdom of the Perl Monks concerning the following question:

This script connects to MSSQL database, reads data and populates it in a pop-up window. Everything is working fine except that the special characters (like chinese/japanese language characters) are NOT getting displayed properly (?? are getting displayed instead). In other words, the encoding is getting lost. The special characters in the database looks fine. The html pop-up window are UTF-8 encoded. So, its the perl script thats causing the issue. Thanks in advance!! @items have the data from the database.

my $count=0;
foreach my $row (@items)
{ 
     my $id = @$row[0];
     my $name = @$row[1];
 
     $logger->debug("INITIAL ID: $id");
     $logger->debug("INITIAL Name: $name");
     eval {Encode::from_to($name, 'utf-16le', 'utf-8', Encode::FB_CROA
+K);};
     #eval {Encode::from_to($name, 'ucs-2', 'utf-8', Encode::FB_CROAK)
+;};
 
        if ($@) {
             my $error_message = qq(Can Not Encode to UTF-8: $@);
             $logger->debug("$error_message"); # Log out $error_messag
+e
         }
 
 
print <<"END";
$name
END
$count=$count+1;
}
[download]

Comment on Encoding problem in perl Download Code

Replies are listed 'Best First'.
Re: Encoding problem in perl by ikegami (Patriarch) on Jul 30, 2009 at 21:14 UTC
Your test isn't runnable, so I had to adapt it a bit `#!/usr/bin/perl use strict; use warnings; use Encode qw( encode from_to ); print("Content-Type: text/html; charset=utf-8\n\n"); my $name = encode('utf-16le', "\x{65E5}\x{672C}\x{8A9E}"); eval { from_to($name, 'utf-16le', 'utf-8', Encode::FB_CROAK); 1 } or print("coaked\n"); print $name;` [download] It works fine. It displays "Japanese (language)" in Japanese. From the prompt, I get: `$ test.cgi \| od -c 0000000 C o n t e n t - T y p e : t +e 0000020 x t / h t m l ; c h a r s e +t 0000040 = u t f - 8 \n \n 346 227 245 346 234 254 350 25 +2 0000060 236 0000061` [download] Find what differs, and you'll find which bad assumption you made. By the way, the following is a better model: `#!/usr/bin/perl use strict; use warnings; use Encode qw( encode decode ); binmode(STDOUT, ':encoding(utf-8)'); print("Content-Type: text/html; charset=utf-8\n\n"); # Input my $name = encode('utf-16le', "\x{65E5}\x{672C}\x{8A9E}"); eval { $name = decode('utf-16le', $name, Encode::FB_CROAK); 1 } or print("coaked\n"); # Process # ... # Output print $name;` [download]	[reply] [d/l] [select]
Re: Encoding problem in perl by moritz (Cardinal) on Jul 30, 2009 at 21:01 UTC
Is the `if ($@)` condition ever triggered? If yes, I guess that the string is already decoded somehow, and must only be encode()d; or it might be in a different encoding than you think it is. A good way to debug this is to enter a single non-ASCII character into the DB, let's say a U+260E BLACK TELEPHONE, or ☎. Then you can inspect your strings with `use Data::Dumper; $Data::Dumper::Useqq = 1; $logger->debug(Dumper $string);` [download] As UTF-16LE that's encoded as `0e 26`, as UTF-8 it's encoded as `e2 98 8e` Armed with that knowledge you check where the character encoding starts to diverge from your expectations.	[reply] [d/l] [select]