in reply to set unicode in perl

Deducing from your example script, I would guess you're outputting to the command-line (as opposed to a web browser or a file)? At any rate, the Encode module may be what you want. I'm sure other monks better versed in character encodings could provide more details if needed.

__________
Systems development is like banging your head against a wall...
It's usually very painful, but if you're persistent, you'll get through it.

Replies are listed 'Best First'.
Re^2: set unicode in perl
by xiaoyafeng (Deacon) on Aug 08, 2008 at 07:20 UTC
    As your suggestion, I've modified the script:
    use strict; use warnings; use Encode; use DBI; use Encode::HanExtra; #my $dbh = DBI->connect(qq(dbi:Oracle:Athenadb), qq(adas), qq(adas),{R +aiseError => 1,oracharset => 'AL32UTF8'}); my $dbh = DBI->connect(qq(dbi:ADO:Provider=MSDAORA.1;Data Source=Athen +aDB), qq(athena), qq(athena), {RaiseError => 1}); my $statment = "select ADAS_ID, ADAS_Name from ADAS_DEVICE"; my $sth = $dbh->prepare($statment) or die dbh->errstr; $sth->execute or die dbh->errstr; while (my ($ADAS_ID, $ADAS_NAME) = $sth->fetchrow_array) { $ADAS_NAME = encode ("gb18030", decode("utf8", $ADAS_NAME)); print "$ADAS_NAME ID is $ADAS_ID\n"; }
    But it just display Partial chars correctly. I notice that I can insert Asia chars into database accurately. Are chars broken when it converted?

    I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

      If you run the code below (bare in mind it drops a table called testutf) does it work? Are the files uni.out and utf.out the same afterwards?

      You can always change it to put more Asian characters in the initial string.

      #!/usr/bin/perl -w use strict; use warnings; use DBI qw(:utils); use charnames ':full'; use Encode; binmode(STDOUT, ":utf8"); my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; print $str, "\n"; print join(" ", unpack("H*", $str)), "\n"; print "length(str) = ", length($str), "\n"; print "bytes::length(str) = ", bytes::length($str), "\n"; print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n"; print "data_string_desc: ", data_string_desc($str),"\n"; open OUT, ">uni.out"; binmode(OUT, ":utf8"); print OUT "$str\n"; my $dbh = DBI->connect("dbi:Oracle:XX", "XX", "XX",{oracharset => 'AL3 +2UTF8', ChopBlanks => 1}); $dbh->do("drop table testutf"); $dbh->do("create table testutf (a char(100))"); my $sth = $dbh->prepare("insert into testutf values (?)"); $sth->execute($str); $sth = $dbh->prepare("select * from testutf"); $sth->execute; my @row = $sth->fetchrow_array; print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\ +n"; print join(" ", unpack("H*", $row[0])), "\n"; open OUT, ">utf.out"; binmode (OUT, ":utf8"); print OUT $row[0]; close OUT;

      We use unicode in Oracle all the time with japanese, chinese, arabic and many other languages and it works fine so long as the database and character set are set to AL32UTF8. Also bare in mind there are oracle database downloads (I think for Oracle XE) which are not labeled "international" and don't do unicode.

      Lastly, what you see when you print unicode to your terminal is no indication of whether the data you retrieved is correct or not - that largely depends on how your terminal is set up and whether your system can display the characters you have output.