Perl DBI can't display Chinese text?

Thai Heng has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl DBI can't display Chinese text? by LanX (Saint) on Oct 13, 2013 at 13:54 UTC
> I use UTF8 ,but that's not solution. How exactly do you use UTF8? There a multiple ways! Plz show us some minimal code reproducing the problem! see also How do I post a question effectively? Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re: Perl DBI can't display Chinese text? by moritz (Cardinal) on Oct 13, 2013 at 14:23 UTC
What character encoding does your terminal window expect? (Also, see http://perlgeek.de/en/article/encodings-and-unicode). Perl 6 - the future is here, just unevenly distributed	[reply]
Re^2: Perl DBI can't display Chinese text? by Thai Heng (Beadle) on Oct 15, 2013 at 01:25 UTC
thanks! The article help me solve this problem as follows: `while(my @line = $sth->fetchrow_array()){ my ($name,$deptcode) = @line; $name = encode("MacChineseSimp", $name); print "one line is ------$name,\t$deptcode\n"; }` [download]	[reply] [d/l]
Re^3: Perl DBI can't display Chinese text? by mje (Curate) on Oct 18, 2013 at 16:24 UTC
I've now had a chance to look at this more as I was a bit surprised you had to do this. The main issue is you are using verchar columns and not nvarchar columns and you did not tell us how the data got into the database in the first place. Let's assume the data was inserted into the database with DBD::ODBC in the first place and see what happens use strict; use warnings; use DBI qw(:utils :sql_types); use 5.016; use Data::Dumper; use Encode; my $h = DBI->connect() or die $DBI::errstr; $h->{RaiseError} = 1; eval { $h->do(q/drop table chinese/); }; $h->do(q/create table chinese (n varchar(30), d varchar(12))/); my $str = "\x{20020}\x{24202}\x{31185}\x{23460}"; my $encoded = encode('UTF-8', $str); foreach my $s (split(//, $encoded)) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($str); my $s = $h->prepare(q/insert into chinese values(?,?)/); $s->execute($str, "111"); $s = $h->prepare(q/select cast(n as varbinary) from chinese/); $s->execute; my $r = $s->fetchall_arrayref; foreach my $s (split(//, $r->[0][0])) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($r->[0][0]); my $d = decode('UTF-8', $r->[0][0]); say data_string_desc($d); print Dumper(\$d); [download] which outputs: `f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 on, non-ASCII, 4 characters 16 bytes f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 off, non-ASCII, 16 characters 16 bytes UTF8 on, non-ASCII, 4 characters 16 bytes $VAR1 = \"\x{20020}\x{24202}\x{31185}\x{23460}";` [download] When I look under the hood at what ODBC APIs DBD::ODBC is calling I see DBD::ODBC took the 4 chinese characters that are UTF-8 encoded in Perl (see first lines of output) and it converted them to UTF16 and bound them as SQL_WCHARs. Note, varchar columns are not really designed to store unicode data. If we convert what SQL Server thinks it has in the column to varbinary and read it back we miracoulously get the chinese string UTF-8 encoded back (the 3rd/4th lines of output). Unsurprisingly we can decode that UTF-8 back into a string in Perl (the last 2 lines of output). In other words, SQL Server does not know that is UTF-8 encoded data and length functions and collation will not work. If we rewrite the select code to a simple "select * from chinese" we seem to get back rubbish. What has happened here is that DBD::ODBC bound what it believed to be char data as SQL_WCHARs, SQL Server passed back UTF16 encoded representation of the 16 bytes and DBD::ODBC decoded it to UTF-8 and hence it is now double encoded UTF-8. If we had changed the code above to bind the input data as SQL_WCHARs SQL Server would have just put a load of question marks in the column as it cannot do what you want. The correct way to do this with SQL Server is to make the column nvarchar and then everything will just work (except collations when characters don't fit into UCS-2 until SQL Server 2012 - see Inserting unicode characters > 0xFFFF (surrogate pairs) into MS SQL Server with Perl DBD::ODBC for why).	[reply] [d/l] [select]
Re^3: Perl DBI can't display Chinese text? by mje (Curate) on Oct 16, 2013 at 08:17 UTC
You shouldn't have to do that call to encode with DBD::ODBC. Your trace shows the column was retrieved as SQL_WCHARs and DBD::ODBC should have encoded the data correctly for you (believe me, I maintain DBD::ODBC). What versions of DBI and DBD::ODBC are you using? Could you provide a small simple example which creates a table like yours, inserts the chinese data and reads it back and I will take a look at it.	[reply]
Re: Perl DBI can't display Chinese text? by graff (Chancellor) on Oct 13, 2013 at 21:45 UTC
Do you have some method other than Perl for inspecting the contents of the database? It's important to know for sure that the database contains "correct" data in the first place. If that's true, the next thing is to make sure what character encoding is being used by the database server to store (or return) the data. If the database stores/returns data in UTF-8 encoding, the next thing is to do just one of the following (whichever one is easiest or makes the most sense): 1. Figure out how to configure your perl DBI connection to the database, so that perl will know that it's getting UTF-8 data in response to queries. OR 2. Connect using the "easiest" (default) method, and `use Encode;` process each string you get from a query like this: `my $usable_string = decode( 'utf8', $string_from_database );` [download] The latter approach would also work if you find out that the database server returns strings using some other encoding (e.g. "gb2312" or whatever) - just use that other encoding in place of "utf8" in the `decode()` call, and that will turn the database string into perl-internal (usable) utf8. If you have trouble, you'll need to show us (1) an example of data you expect to get back (because you've seen this data using some other tool to query the database), (2) the perl code you used to try getting the string, and (3) what you actually got from your perl script. As indicated in a previous reply above, it may also be important to ensure that you are using a terminal or other display method that you are sure is able to "do the right thing" with the text in question.	[reply] [d/l] [select]
Re: Perl DBI can't display Chinese text? by mje (Curate) on Oct 14, 2013 at 08:21 UTC
What platform are you running this on? DBD::ODBC only builds by default using the ODBC Unicode API on Windows. On UNIX you need to add the -u switch to Makefile.PL. What ODBC driver are you using? (as not all of them support unicode). Whilst you are at it show us your DBI and DBD::ODBC versions - you can do that with: `perl -MDBI -le "print $DBI::VERSION" perl -MDBD::ODBC -le "print $DBD::ODBC::VERSION"` [download] Are you sure the column containing chinese text is known to SQL Server as Chinese - normally data like this goes into nvarchar columns. Even if DBD::ODBC is built using the unicode API, the ODBC driver needs to tell DBD::ODBC the column is of type SQL_WCHAR - we will only see that if you provide a trace. Assuming you have a recent DBI and DBD::ODBC you can produce a trace like this: `# on windows set DBI_TRACE=DBD=x.log run your perl program here # on unix export DBI_TRACE=DBD=x.log run your perl program here` [download] The logging will end up in the file x.log. If that does not produce any logging then your DBI and/or DBD::ODBC are too old so replace the "DBD" above with "15" (you'll get a lot more logging, most of which is not required).	[reply] [d/l] [select]
Re^2: Perl DBI can't display Chinese text? by Thai Heng (Beadle) on Oct 14, 2013 at 22:50 UTC
Thanks mje! According to your suggest, I trace my perl code. The x.log as follows: Unicode login6 dbname=SQLSERVER2008R2, uid='sa', pwd=xxxxx SQLDriverConnect 'SQLSERVER2008R2', ''sa'', 'xxxx' SQLDriverConnectW failed: SQLConnect ''SQLSERVER2008R2'', ''sa'' !!dbd_error2(err_rc=1, what=db_login6sv/SQLConnectW, handles=(26a8 +398,20caeb0,0) !SQLError(26a8398,20caeb0,0) = (01000, 5701, [Microsoft][SQL Serve +r Native Client 10.0][SQL Server]ӑ��˽�ݿ㊏&#102 +6;τ��'ECISAN_JX'��) !SQLError(26a8398,20caeb0,0) = (01000, 5703, [Microsoft][SQL Serve +r Native Client 10.0][SQL Server]ӑ��ԯҔʨ׃ +���󍦖юġ�) Turning autocommit on DRIVER_ODBC_VER = 03.52 DRIVER_NAME = sqlncli10.dll, type=2 DRIVER_VERSION = 10.50.4000 MAX_COLUMN_NAME_LEN = 128 DBD::ODBC is unicode built : YES SQLMoreResults supported: 1 SQLDescribeParam supported: 1 !!DBD::ODBC unsupported attribute passed (PrintError) setting AutoCommit !!DBD::ODBC unsupported attribute passed (Username) !!DBD::ODBC unsupported attribute passed (dbi_connect_closure) initializing sth query timeout to -1 ignore named placeholders = 0 SQLPrepare select top 1000 name,create_date from sys.objects where + type = 'U' and charindex('t_',name) > 0 order by create_date Processing non-utf8 sql in unicode mode SQLPrepare = 0 initializing sth query timeout to -1 ignore named placeholders = 0 SQLPrepare select name,deptcode from t_department Processing non-utf8 sql in unicode mode SQLPrepare = 0 +dbd_st_execute(2683be4) dbd_st_finish(2683be4) outparams = 0 SQLExecute/SQLExecDirect(20c6630)=0 SQLRowCount=0 (rows=-1) SQLNumResultCols=0 (flds=2) dbd_describe done_desc=0 dbd_describe SQLNumResultCols=0 (columns=2) DescribeCol column = 1, name = n, namelen = 4, type = VARCHAR(12), + precision/column size = 50, scale = 0, nullable = 0 SQL_COLUMN_DISPLAY_SIZE = 50 SQL_COLUMN_LENGTH = 50 now using col 1: type = UNICODE CHAR (-8), len = 102, display siz +e = 102, prec = 50, scale = 0 DescribeCol column = 2, name = d, namelen = 8, type = VARCHAR(12), + precision/column size = 10, scale = 0, nullable = 0 SQL_COLUMN_DISPLAY_SIZE = 10 SQL_COLUMN_LENGTH = 10 now using col 2: type = UNICODE CHAR (-8), len = 22, display size + = 22, prec = 10, scale = 0 -dbd_describe done_bind=0 have 2 fields -dbd_st_execute(2683be4)=-1 bind_columns fbh=2716b94 fields=2 Bind 1: type = UNICODE CHAR(-8), buf=2180494, buflen=102 Bind 2: type = UNICODE CHAR(-8), buf=21804fc, buflen=22 bind_columns=0 SQLFetch=0 fetch num_fields=2 fetch col#1 n datalen=8 displ=102 SQL_C_WCHAR data = "临床科室" fetch col#2 d datalen=2 displ=22 SQL_C_WCHAR data = '1' SQLFetch=0 fetch num_fields=2 ... ... [download] In my compute, chinese display well in x.log. In print text(cmd), there is such info as follows: `Wide character in print at E:\temp\DBI_HG_Study_20130912\ConnectSQL +Server.pl line 122.` [download] What's the question key?	[reply] [d/l] [select]
Re^3: Perl DBI can't display Chinese text? by graff (Chancellor) on Oct 15, 2013 at 01:55 UTC
Regarding this message: `Wide character in print at E:\...\ConnectSQLServer.pl line 122.` [download] Figure out what file handle is being printed to at line 122 of that script. If it's STDOUT, then somewhere before you reach that statement, you should do: `binmode STDOUT, ':utf8';` [download] If it's some file handle that you open, add ':utf8' to the mode portion of the open call - e.g. `open( my $output_handle, '>:utf8', $output_name ) or die $!;` [download]	[reply] [d/l] [select]
Re^3: Perl DBI can't display Chinese text? by mje (Curate) on Oct 15, 2013 at 09:33 UTC
So you are running on Windows, your DBD::ODBC is a unicode supprting build and you are using the MS SQL Server native client driver. We still don't know your DBI and DBD::ODBC versions. I slightly feel you are drip feeding us information. As far as I can see that log is showing chinese characters returned. Characters 临, 床, 科 and 室 all look like chinese unicode characters. When you fetch that column you can pass it to data_string_desc and DBI will show you more information. If you do what graff told you and your terminal is set up correctly you should be able to print that string.	[reply]
Re: Perl DBI can't display Chinese text? by Anonymous Monk on Oct 14, 2013 at 10:56 UTC
Here's how to figure how Perl sees the incoming data: `use Data::Dumper; $Data::Dumper::Useqq = 1; ... print Dumper(\@row);` [download] Paste some sample output from that and we can help you better. (Also, the ungarbled versions of the same content would help, too.)	[reply] [d/l]