Sly_G has asked for the wisdom of the Perl Monks concerning the following question:

Using russian (non-ascii) symbols (7-letter word):
use utf8; use open OUT => ':utf8'; use DBI; my $dbh = DBI->connect("DBI:mysql:database=mybase;host=localhost;port= +3306", "login", "pass"); #$dbh->do('SET CHARACTER SET utf8'); open TST, '>utftest1.txt'; binmode TST; print TST "русский"; $test = $dbh->selectrow_array("SELECT 'русс&#1 +082;ий'"); open TST, '>utftest2.txt'; binmode TST; print TST $test;
Result: file utftest1.txt contains 14 bytes, I can see the word in it with any text editor (7 symbols x 2 bytes, makes sense). file utftest2.txt contains 28 bytes of I don't know what:
0000000000: C3 91 C2 80 C3 91 C2 83 | C3 91 C2 81 C3 91 C2 81 0000000010: C3 90 C2 BA C3 90 C2 B8 | C3 90 C2 B9
Uncommenting line with base character set changes nothing.

Replies are listed 'Best First'.
Re: utf-8 problems
by moritz (Cardinal) on Jan 09, 2012 at 16:05 UTC
Re: utf-8 problems
by mbethke (Hermit) on Jan 09, 2012 at 21:48 UTC

    What Moritz said. For an explanation of what happens: your UTF8 goes into the DB fine but when it comes out, Perl thinks its bytes were Latin-1 and re-encodes these to UTF-8. As the bytes all fall into the upper half of the 8-bit character set, the UTF-8 representation is twice that long.