I just installed the current version of DBD::CSV from CPAN (which involved updating my version of DBD::File so that I could install SQL::Statment). I wasn't able to find any mention of the "unicode => 1" attribute setting that you mentioned -- not in DBI, or Text::CSV_XS, DBD::File or DBD::CSV. (Where did you find that, exactly?)
I tried this little test script, which involves storing a string that includes an Arabic character in each row:
use strict;
use DBI;
my $Usage = "$0 ";
my $db = DBI->connect("DBI:CSV:"); # use current working directory
$db->do("CREATE TABLE csv_test (id INTEGER, name CHAR(16))");
my $sth = $db->prepare("INSERT INTO csv_test (id,name) values (?,?)");
my @strings = ( "one \x{0661}",
"two \x{0662}",
"three \x{0663}" );
binmode STDOUT, ":utf8";
for (0..$#strings) {
printf "inserting %d,%s\n", $_+1, $strings[$_];
$sth->execute( $_+1, $strings[$_] );
}
$sth->finish();
$db->disconnect;
Having run that, I found that the resulting csv_test "database" file did in fact have valid and correct utf8 characters in it, as intended. When I added the following lines to the script and ran it again, I saw the problem:
$sth = $db->prepare("SELECT * FROM csv_test");
$sth->execute;
while( my $row = $sth->fetchrow_arrayref ) {
printf "retrieved %d,%s\n", @$row;
}
$sth->finish;
The problem was that when the perl script reads the strings back from the "database" file, it has no way of knowing that the strings are utf8. To get it to come out right, I have to add use Encode; to the script, and add the following line just before the printf statement:
$$row[1] = decode( "utf8", $$row[1] );
It's sort of a shame not being able to tell perl that the file data should be read as utf8 in the first place; you just have to work around that on your own with Encode. You could do that as part of the fetch:
my @values = map { decode( "utf8", $_ ) } $sth->fetchrow_array;
|