Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Perl DBI can't display Chinese text?

by moritz (Cardinal)
on Oct 13, 2013 at 14:23 UTC ( [id://1058075]=note: print w/replies, xml ) Need Help??


in reply to Perl DBI can't display Chinese text?

What character encoding does your terminal window expect?

(Also, see http://perlgeek.de/en/article/encodings-and-unicode).

Replies are listed 'Best First'.
Re^2: Perl DBI can't display Chinese text?
by Thai Heng (Beadle) on Oct 15, 2013 at 01:25 UTC
    thanks! The article help me solve this problem as follows:
    while(my @line = $sth->fetchrow_array()){ my ($name,$deptcode) = @line; $name = encode("MacChineseSimp", $name); print "one line is ------$name,\t$deptcode\n"; }

      I've now had a chance to look at this more as I was a bit surprised you had to do this.

      The main issue is you are using verchar columns and not nvarchar columns and you did not tell us how the data got into the database in the first place. Let's assume the data was inserted into the database with DBD::ODBC in the first place and see what happens

      use strict; use warnings; use DBI qw(:utils :sql_types); use 5.016; use Data::Dumper; use Encode; my $h = DBI->connect() or die $DBI::errstr; $h->{RaiseError} = 1; eval { $h->do(q/drop table chinese/); }; $h->do(q/create table chinese (n varchar(30), d varchar(12))/); my $str = "\x{20020}\x{24202}\x{31185}\x{23460}"; my $encoded = encode('UTF-8', $str); foreach my $s (split(//, $encoded)) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($str); my $s = $h->prepare(q/insert into chinese values(?,?)/); $s->execute($str, "111"); $s = $h->prepare(q/select cast(n as varbinary) from chinese/); $s->execute; my $r = $s->fetchall_arrayref; foreach my $s (split(//, $r->[0][0])) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($r->[0][0]); my $d = decode('UTF-8', $r->[0][0]); say data_string_desc($d); print Dumper(\$d);

      which outputs:

      f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 on, non-ASCII, 4 characters 16 bytes f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 off, non-ASCII, 16 characters 16 bytes UTF8 on, non-ASCII, 4 characters 16 bytes $VAR1 = \"\x{20020}\x{24202}\x{31185}\x{23460}";

      When I look under the hood at what ODBC APIs DBD::ODBC is calling I see DBD::ODBC took the 4 chinese characters that are UTF-8 encoded in Perl (see first lines of output) and it converted them to UTF16 and bound them as SQL_WCHARs. Note, varchar columns are not really designed to store unicode data. If we convert what SQL Server thinks it has in the column to varbinary and read it back we miracoulously get the chinese string UTF-8 encoded back (the 3rd/4th lines of output). Unsurprisingly we can decode that UTF-8 back into a string in Perl (the last 2 lines of output). In other words, SQL Server does not know that is UTF-8 encoded data and length functions and collation will not work.

      If we rewrite the select code to a simple "select * from chinese" we seem to get back rubbish. What has happened here is that DBD::ODBC bound what it believed to be char data as SQL_WCHARs, SQL Server passed back UTF16 encoded representation of the 16 bytes and DBD::ODBC decoded it to UTF-8 and hence it is now double encoded UTF-8.

      If we had changed the code above to bind the input data as SQL_WCHARs SQL Server would have just put a load of question marks in the column as it cannot do what you want.

      The correct way to do this with SQL Server is to make the column nvarchar and then everything will just work (except collations when characters don't fit into UCS-2 until SQL Server 2012 - see Inserting unicode characters > 0xFFFF (surrogate pairs) into MS SQL Server with Perl DBD::ODBC for why).

      You shouldn't have to do that call to encode with DBD::ODBC. Your trace shows the column was retrieved as SQL_WCHARs and DBD::ODBC should have encoded the data correctly for you (believe me, I maintain DBD::ODBC). What versions of DBI and DBD::ODBC are you using?

      Could you provide a small simple example which creates a table like yours, inserts the chinese data and reads it back and I will take a look at it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1058075]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-03-29 10:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found