set unicode in perl

xiaoyafeng has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I spent a whole yesterday to study a weird problem with the script transfered by VBS but failed. Now I have to turn to use DBI to store data directly. It seems all good except one thing: column related to asia language can't display. Below is my code.

use strict;
use warnings;

use DBI;
binmode(STDOUT, ":utf8");

#my $dbh = DBI->connect(qq(dbi:Oracle:Athenadb), qq(athena), qq(athena
+),{RaiseError => 1, ora_charset => 'AL32UTF8'});
my $dbh = DBI->connect(qq(dbi:ADO:Provider=MSDAORA.1;Data Source=Athen
+aDB), qq(athena), qq(athena), {RaiseError => 1});
my $statment = "select ADAS_ID, ADAS_Name from ADAS_DEVICE"; 
my $sth = $dbh->prepare($statment) or die dbh->errstr; 
$sth->execute or die dbh->errstr;
while (my ($ADAS_ID, $ADAS_NAME) = $sth->fetchrow_array) { print "$ADA
+S_NAME ID is $ADAS_ID \n";}

__OUTPUT__
&#24537;&#32892;&#40065;&#24537;&#32891;&#32884;&#24537;&#32883;&#2552
+3;&#27667;&#27004;&#27004;&#33725;&#28510;?35 ID is 296827
&#24537;&#32892;&#40065;&#24537;&#32891;&#32884;&#33541;&#32891;&#3286
+9;&#27667;&#32872;&#32860;&#33725;&#28510;?36 ID is 296890
&#24537;&#32892;&#40065;&#24537;&#32891;&#32884;&#33541;&#32877;&#3344
+6;&#27667;&#32892;&#32872;&#33725;&#28510;?33 ID is 296913
&#24537;&#32892;&#40065;&#24537;&#32891;&#32884;&#29483;&#27004;&#3954
+0;&#24537;&#32892;&#40065;&#33725;&#28510;?42 ID is 268542
[download]

Oracle's char-set is AMERICAN_AMERICA.utf8. Does I need to set char-set of Oracle in my script?
any insights into it would be really appreciated!

I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Comment on set unicode in perl Download Code

Replies are listed 'Best First'.
Re: set unicode in perl by EvanK (Chaplain) on Aug 08, 2008 at 04:10 UTC
Deducing from your example script, I would guess you're outputting to the command-line (as opposed to a web browser or a file)? At any rate, the Encode module may be what you want. I'm sure other monks better versed in character encodings could provide more details if needed. __________ Systems development is like banging your head against a wall... It's usually very painful, but if you're persistent, you'll get through it.	[reply]
Re^2: set unicode in perl by xiaoyafeng (Deacon) on Aug 08, 2008 at 07:20 UTC
As your suggestion, I've modified the script: use strict; use warnings; use Encode; use DBI; use Encode::HanExtra; #my $dbh = DBI->connect(qq(dbi:Oracle:Athenadb), qq(adas), qq(adas),{R +aiseError => 1,oracharset => 'AL32UTF8'}); my $dbh = DBI->connect(qq(dbi:ADO:Provider=MSDAORA.1;Data Source=Athen +aDB), qq(athena), qq(athena), {RaiseError => 1}); my $statment = "select ADAS_ID, ADAS_Name from ADAS_DEVICE"; my $sth = $dbh->prepare($statment) or die dbh->errstr; $sth->execute or die dbh->errstr; while (my ($ADAS_ID, $ADAS_NAME) = $sth->fetchrow_array) { $ADAS_NAME = encode ("gb18030", decode("utf8", $ADAS_NAME)); print "$ADAS_NAME ID is $ADAS_ID\n"; } [download] But it just display Partial chars correctly. I notice that I can insert Asia chars into database accurately. Are chars broken when it converted? I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction	[reply] [d/l]
Re^3: set unicode in perl by mje (Curate) on Aug 08, 2008 at 08:36 UTC
If you run the code below (bare in mind it drops a table called testutf) does it work? Are the files uni.out and utf.out the same afterwards? You can always change it to put more Asian characters in the initial string. #!/usr/bin/perl -w use strict; use warnings; use DBI qw(:utils); use charnames ':full'; use Encode; binmode(STDOUT, ":utf8"); my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; print $str, "\n"; print join(" ", unpack("H", $str)), "\n"; print "length(str) = ", length($str), "\n"; print "bytes::length(str) = ", bytes::length($str), "\n"; print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n"; print "data_string_desc: ", data_string_desc($str),"\n"; open OUT, ">uni.out"; binmode(OUT, ":utf8"); print OUT "$str\n"; my $dbh = DBI->connect("dbi:Oracle:XX", "XX", "XX",{oracharset => 'AL3 +2UTF8', ChopBlanks => 1}); $dbh->do("drop table testutf"); $dbh->do("create table testutf (a char(100))"); my $sth = $dbh->prepare("insert into testutf values (?)"); $sth->execute($str); $sth = $dbh->prepare("select from testutf"); $sth->execute; my @row = $sth->fetchrow_array; print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\ +n"; print join(" ", unpack("H*", $row[0])), "\n"; open OUT, ">utf.out"; binmode (OUT, ":utf8"); print OUT $row[0]; close OUT; [download] We use unicode in Oracle all the time with japanese, chinese, arabic and many other languages and it works fine so long as the database and character set are set to AL32UTF8. Also bare in mind there are oracle database downloads (I think for Oracle XE) which are not labeled "international" and don't do unicode. Lastly, what you see when you print unicode to your terminal is no indication of whether the data you retrieved is correct or not - that largely depends on how your terminal is set up and whether your system can display the characters you have output.	[reply] [d/l]