Thai Heng has asked for the wisdom of the Perl Monks concerning the following question:
### Connect to the database my $dbh = DBI->connect( "dbi:ODBC:hengaini","dba","sql", { RaiseError => 1 }); my $sth = $dbh->prepare("select * from dw"); $sth->execute() or die"Can't execute SQL statement:$DBI::errstr"; my @row; while( @row = $sth->fetchrow_array()){ print "Row:@row\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl DBI can't display Chinese text?
by LanX (Saint) on Oct 13, 2013 at 13:54 UTC | |
How exactly do you use UTF8? There a multiple ways! Plz show us some minimal code reproducing the problem! see also How do I post a question effectively? Cheers Rolf ( addicted to the Perl Programming Language) | [reply] |
|
Re: Perl DBI can't display Chinese text?
by moritz (Cardinal) on Oct 13, 2013 at 14:23 UTC | |
What character encoding does your terminal window expect? (Also, see http://perlgeek.de/en/article/encodings-and-unicode). | [reply] |
by Thai Heng (Beadle) on Oct 15, 2013 at 01:25 UTC | |
| [reply] [d/l] |
by mje (Curate) on Oct 18, 2013 at 16:24 UTC | |
I've now had a chance to look at this more as I was a bit surprised you had to do this. The main issue is you are using verchar columns and not nvarchar columns and you did not tell us how the data got into the database in the first place. Let's assume the data was inserted into the database with DBD::ODBC in the first place and see what happens
which outputs:
When I look under the hood at what ODBC APIs DBD::ODBC is calling I see DBD::ODBC took the 4 chinese characters that are UTF-8 encoded in Perl (see first lines of output) and it converted them to UTF16 and bound them as SQL_WCHARs. Note, varchar columns are not really designed to store unicode data. If we convert what SQL Server thinks it has in the column to varbinary and read it back we miracoulously get the chinese string UTF-8 encoded back (the 3rd/4th lines of output). Unsurprisingly we can decode that UTF-8 back into a string in Perl (the last 2 lines of output). In other words, SQL Server does not know that is UTF-8 encoded data and length functions and collation will not work. If we rewrite the select code to a simple "select * from chinese" we seem to get back rubbish. What has happened here is that DBD::ODBC bound what it believed to be char data as SQL_WCHARs, SQL Server passed back UTF16 encoded representation of the 16 bytes and DBD::ODBC decoded it to UTF-8 and hence it is now double encoded UTF-8. If we had changed the code above to bind the input data as SQL_WCHARs SQL Server would have just put a load of question marks in the column as it cannot do what you want. The correct way to do this with SQL Server is to make the column nvarchar and then everything will just work (except collations when characters don't fit into UCS-2 until SQL Server 2012 - see Inserting unicode characters > 0xFFFF (surrogate pairs) into MS SQL Server with Perl DBD::ODBC for why). | [reply] [d/l] [select] |
by mje (Curate) on Oct 16, 2013 at 08:17 UTC | |
You shouldn't have to do that call to encode with DBD::ODBC. Your trace shows the column was retrieved as SQL_WCHARs and DBD::ODBC should have encoded the data correctly for you (believe me, I maintain DBD::ODBC). What versions of DBI and DBD::ODBC are you using? Could you provide a small simple example which creates a table like yours, inserts the chinese data and reads it back and I will take a look at it. | [reply] |
|
Re: Perl DBI can't display Chinese text?
by graff (Chancellor) on Oct 13, 2013 at 21:45 UTC | |
If that's true, the next thing is to make sure what character encoding is being used by the database server to store (or return) the data. If the database stores/returns data in UTF-8 encoding, the next thing is to do just one of the following (whichever one is easiest or makes the most sense): 1. Figure out how to configure your perl DBI connection to the database, so that perl will know that it's getting UTF-8 data in response to queries. OR 2. Connect using the "easiest" (default) method, and use Encode; process each string you get from a query like this: The latter approach would also work if you find out that the database server returns strings using some other encoding (e.g. "gb2312" or whatever) - just use that other encoding in place of "utf8" in the decode() call, and that will turn the database string into perl-internal (usable) utf8. If you have trouble, you'll need to show us (1) an example of data you expect to get back (because you've seen this data using some other tool to query the database), (2) the perl code you used to try getting the string, and (3) what you actually got from your perl script. As indicated in a previous reply above, it may also be important to ensure that you are using a terminal or other display method that you are sure is able to "do the right thing" with the text in question. | [reply] [d/l] [select] |
|
Re: Perl DBI can't display Chinese text?
by mje (Curate) on Oct 14, 2013 at 08:21 UTC | |
What platform are you running this on? DBD::ODBC only builds by default using the ODBC Unicode API on Windows. On UNIX you need to add the -u switch to Makefile.PL. What ODBC driver are you using? (as not all of them support unicode). Whilst you are at it show us your DBI and DBD::ODBC versions - you can do that with:
Are you sure the column containing chinese text is known to SQL Server as Chinese - normally data like this goes into nvarchar columns. Even if DBD::ODBC is built using the unicode API, the ODBC driver needs to tell DBD::ODBC the column is of type SQL_WCHAR - we will only see that if you provide a trace. Assuming you have a recent DBI and DBD::ODBC you can produce a trace like this:
The logging will end up in the file x.log. If that does not produce any logging then your DBI and/or DBD::ODBC are too old so replace the "DBD" above with "15" (you'll get a lot more logging, most of which is not required). | [reply] [d/l] [select] |
by Thai Heng (Beadle) on Oct 14, 2013 at 22:50 UTC | |
In my compute, chinese display well in x.log. In print text(cmd), there is such info as follows: What's the question key? | [reply] [d/l] [select] |
by graff (Chancellor) on Oct 15, 2013 at 01:55 UTC | |
Figure out what file handle is being printed to at line 122 of that script. If it's STDOUT, then somewhere before you reach that statement, you should do: If it's some file handle that you open, add ':utf8' to the mode portion of the open call - e.g.
| [reply] [d/l] [select] |
by mje (Curate) on Oct 15, 2013 at 09:33 UTC | |
So you are running on Windows, your DBD::ODBC is a unicode supprting build and you are using the MS SQL Server native client driver. We still don't know your DBI and DBD::ODBC versions. I slightly feel you are drip feeding us information. As far as I can see that log is showing chinese characters returned. Characters 临, 床, 科 and 室 all look like chinese unicode characters. When you fetch that column you can pass it to data_string_desc and DBI will show you more information. If you do what graff told you and your terminal is set up correctly you should be able to print that string. | [reply] |
|
Re: Perl DBI can't display Chinese text?
by Anonymous Monk on Oct 14, 2013 at 10:56 UTC | |
Paste some sample output from that and we can help you better. (Also, the ungarbled versions of the same content would help, too.) | [reply] [d/l] |