comment on

I just installed the current version of DBD::CSV from CPAN (which involved updating my version of DBD::File so that I could install SQL::Statment). I wasn't able to find any mention of the "unicode => 1" attribute setting that you mentioned -- not in DBI, or Text::CSV_XS, DBD::File or DBD::CSV. (Where did you find that, exactly?)

I tried this little test script, which involves storing a string that includes an Arabic character in each row:

use strict;
use DBI;

my $Usage = "$0 ";

my $db = DBI->connect("DBI:CSV:");  # use current working directory

$db->do("CREATE TABLE csv_test (id INTEGER, name CHAR(16))");

my $sth = $db->prepare("INSERT INTO csv_test (id,name) values (?,?)");
my @strings = ( "one \x{0661}",
                "two \x{0662}",
                "three \x{0663}" );
binmode STDOUT, ":utf8";
for (0..$#strings) {
    printf "inserting %d,%s\n", $_+1, $strings[$_];
    $sth->execute( $_+1, $strings[$_] );
}
$sth->finish();

$db->disconnect;
[download]

Having run that, I found that the resulting csv_test "database" file did in fact have valid and correct utf8 characters in it, as intended. When I added the following lines to the script and ran it again, I saw the problem:

$sth = $db->prepare("SELECT * FROM csv_test");
$sth->execute;
while( my $row = $sth->fetchrow_arrayref ) {
    printf "retrieved %d,%s\n", @$row;
}
$sth->finish;
[download]

The problem was that when the perl script reads the strings back from the "database" file, it has no way of knowing that the strings are utf8. To get it to come out right, I have to add use Encode; to the script, and add the following line just before the printf statement:

    $$row[1] = decode( "utf8", $$row[1] );
[download]

It's sort of a shame not being able to tell perl that the file data should be read as utf8 in the first place; you just have to work around that on your own with Encode. You could do that as part of the fetch:

my @values = map { decode( "utf8", $_ ) } $sth->fetchrow_array;
[download]

In reply to Re: DBD::CSV with utf8 by graff
in thread DBD::CSV with utf8 by perlmonkdr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.