DBD::mysql is transforming valid utf8 into gibberish(?) on the way through mysql.so:
746572e280a62ee280... becomes 746572c3a2c280c2a6...

UPDATE

mysql.so is doing a gratuitous utf8::encode($s) on strings that have the utf8::is_utf8($s) bit set. I was able to compensate for this by performing this just prior to the call to mysql's execute:

    for(keys %$row){utf8::decode($row->{$_}) if utf8::is_utf8($row->{$_})}

END UPDATE

Neither use open ':utf8' nor use open ':encoding(UTF-8)'; changed the bogus behavior.

A table, the dump for which starts:

CREATE TABLE `host_MyApp_DUFs` ( `DUF_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `member_id` bigint(20) unsigned NOT NULL COMMENT '', `install_id` tinyint(1) unsigned NOT NULL COMMENT '', `job_id` int(12) unsigned NOT NULL DEFAULT '0' COMMENT '', `content` longtext COLLATE utf8_unicode_ci COMMENT '', `token_id` bigint(20) unsigned DEFAULT NULL COMMENT 'links to `host_ +MyApp_tokens`', PRIMARY KEY (`member_id`,`install_id`,`job_id`,`DUF_id`), UNIQUE KEY `token_id` (`token_id`), KEY `DUF_id` (`DUF_id`) ) ENGINE=InnoDB AUTO_INCREMENT=406 DEFAULT CHARSET=utf8 COLLATE=utf8_u +nicode_ci COMMENT=''; /*!40101 SET character_set_client = @saved_cs_client */;

is being inserted (field 'content') with a valid UTF8 string (as verified with Test::utf8's "is_sane_utf8" and "is_flagged_utf8") as the bind input to the "execute" method.

The connection was opened with:

my $dbix = DBIx::Lite-> connect( "dbi:mysql:dbname=$ENV{DB_NAME}", $ENV{DB_USER}, $ENV{DB_PASSWORD}, { mysql_enable_utf8 => 1 } );

And strace verifies the "SET NAMES utf8" command is traversing the socket from the client to the server. Moreover the query:

show VARIABLES LIKE 'character_set%';

results in:

DB<6> x $st->fetchrow 0 'character_set_client' 1 'utf8' + + DB<7> x $st->fetchrow 0 'character_set_connection' 1 'utf8' + + DB<8> x $st->fetchrow 0 'character_set_database' 1 'utf8' + + DB<9> x $st->fetchrow 0 'character_set_filesystem' 1 'utf8' + + DB<10> x $st->fetchrow 0 'character_set_results' 1 'utf8' + + DB<11> x $st->fetchrow 0 'character_set_server' 1 'utf8' + + DB<12> x $st->fetchrow 0 'character_set_system' 1 'utf8' + + DB<13> x $st->fetchrow 0 'character_sets_dir' 1 '/usr/share/mysql/charsets/'

However, strace of the data going from the client to the server shows an octet string that has everything intact (ie: 'INSERT INTO.... regular ascii data for content, etc') except the multi-octet utf8 characters. They've been transformed. An example is:

746572e280a62ee280... becomes 746572c3a2c280c2a6...
Is it time to go to uuencode or carrier pigeon with the elder futhark or something?

In reply to UTF8 to Mysql transformed by mysql.so? by jabowery

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.