xiaoyafeng has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I spent a whole yesterday to study a weird problem with the script transfered by VBS but failed. Now I have to turn to use DBI to store data directly. It seems all good except one thing: column related to asia language can't display. Below is my code.
use strict; use warnings; use DBI; binmode(STDOUT, ":utf8"); #my $dbh = DBI->connect(qq(dbi:Oracle:Athenadb), qq(athena), qq(athena +),{RaiseError => 1, ora_charset => 'AL32UTF8'}); my $dbh = DBI->connect(qq(dbi:ADO:Provider=MSDAORA.1;Data Source=Athen +aDB), qq(athena), qq(athena), {RaiseError => 1}); my $statment = "select ADAS_ID, ADAS_Name from ADAS_DEVICE"; my $sth = $dbh->prepare($statment) or die dbh->errstr; $sth->execute or die dbh->errstr; while (my ($ADAS_ID, $ADAS_NAME) = $sth->fetchrow_array) { print "$ADA +S_NAME ID is $ADAS_ID \n";} __OUTPUT__ 忙聼鲁忙聻聴忙聳&#2552 +3;氓楼楼莽潞?35 ID is 296827 忙聼鲁忙聻聴茅聻&#3286 +9;氓聨聜莽潞?36 ID is 296890 忙聼鲁忙聻聴茅聭&#3344 +6;氓聼聨莽潞?33 ID is 296913 忙聼鲁忙聻聴猫楼&#3954 +0;忙聼鲁莽潞?42 ID is 268542
Oracle's char-set is AMERICAN_AMERICA.utf8. Does I need to set char-set of Oracle in my script?
any insights into it would be really appreciated!

I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Replies are listed 'Best First'.
Re: set unicode in perl
by EvanK (Chaplain) on Aug 08, 2008 at 04:10 UTC
    Deducing from your example script, I would guess you're outputting to the command-line (as opposed to a web browser or a file)? At any rate, the Encode module may be what you want. I'm sure other monks better versed in character encodings could provide more details if needed.

    __________
    Systems development is like banging your head against a wall...
    It's usually very painful, but if you're persistent, you'll get through it.

      As your suggestion, I've modified the script:
      use strict; use warnings; use Encode; use DBI; use Encode::HanExtra; #my $dbh = DBI->connect(qq(dbi:Oracle:Athenadb), qq(adas), qq(adas),{R +aiseError => 1,oracharset => 'AL32UTF8'}); my $dbh = DBI->connect(qq(dbi:ADO:Provider=MSDAORA.1;Data Source=Athen +aDB), qq(athena), qq(athena), {RaiseError => 1}); my $statment = "select ADAS_ID, ADAS_Name from ADAS_DEVICE"; my $sth = $dbh->prepare($statment) or die dbh->errstr; $sth->execute or die dbh->errstr; while (my ($ADAS_ID, $ADAS_NAME) = $sth->fetchrow_array) { $ADAS_NAME = encode ("gb18030", decode("utf8", $ADAS_NAME)); print "$ADAS_NAME ID is $ADAS_ID\n"; }
      But it just display Partial chars correctly. I notice that I can insert Asia chars into database accurately. Are chars broken when it converted?

      I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

        If you run the code below (bare in mind it drops a table called testutf) does it work? Are the files uni.out and utf.out the same afterwards?

        You can always change it to put more Asian characters in the initial string.

        #!/usr/bin/perl -w use strict; use warnings; use DBI qw(:utils); use charnames ':full'; use Encode; binmode(STDOUT, ":utf8"); my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; print $str, "\n"; print join(" ", unpack("H*", $str)), "\n"; print "length(str) = ", length($str), "\n"; print "bytes::length(str) = ", bytes::length($str), "\n"; print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n"; print "data_string_desc: ", data_string_desc($str),"\n"; open OUT, ">uni.out"; binmode(OUT, ":utf8"); print OUT "$str\n"; my $dbh = DBI->connect("dbi:Oracle:XX", "XX", "XX",{oracharset => 'AL3 +2UTF8', ChopBlanks => 1}); $dbh->do("drop table testutf"); $dbh->do("create table testutf (a char(100))"); my $sth = $dbh->prepare("insert into testutf values (?)"); $sth->execute($str); $sth = $dbh->prepare("select * from testutf"); $sth->execute; my @row = $sth->fetchrow_array; print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\ +n"; print join(" ", unpack("H*", $row[0])), "\n"; open OUT, ">utf.out"; binmode (OUT, ":utf8"); print OUT $row[0]; close OUT;

        We use unicode in Oracle all the time with japanese, chinese, arabic and many other languages and it works fine so long as the database and character set are set to AL32UTF8. Also bare in mind there are oracle database downloads (I think for Oracle XE) which are not labeled "international" and don't do unicode.

        Lastly, what you see when you print unicode to your terminal is no indication of whether the data you retrieved is correct or not - that largely depends on how your terminal is set up and whether your system can display the characters you have output.