in reply to Re^3: Inserting UTF-8 on Mysql using DBI
in thread Inserting UTF-8 on Mysql using DBI

I found the problem, as it turns out, neither mysql client nor php were using utf8 after all, only perl.. ugh... one thing I still don't understand though is why the characters where being display right on the browser despite the fact the page always had the content-type utf8 header... I guess I understand charset encoding even less now..
  • Comment on Re^4: Inserting UTF-8 on Mysql using DBI

Replies are listed 'Best First'.
Re^5: Inserting UTF-8 on Mysql using DBI
by Corion (Patriarch) on Oct 16, 2010 at 17:10 UTC

    Browsers really like to make a "best effort" at guessing the content, even if they have to deviate from the Content-Type: text/html; charset=utf-8 header. Which is why eliminating all intermediaries and cross-checking all steps is the only approach I know that works.

Re^5: Inserting UTF-8 on Mysql using DBI
by afoken (Chancellor) on Oct 18, 2010 at 14:43 UTC

    Smells like "the other" programs inserted UTF-8 byte streams that luckily came back unmodified from MySQL. So you could insert and fetch something that looked like UTF-8, even when MySQL converted the byte stream from what it thought to be ISO-8859-1 to broken UTF-8 while inserting, and back from broken UTF-8 to ISO-8859-1. A big hint for such things going wrong is that the strings have the wrong length in the database (one or two extra characters for each non-ASCII character). Have a look at the Unicode tests in DBD::ODBC, especially t/40UnicodeRoundTrip.t and t/41Unicode.t.

    The browser shows the correct characters because you told it explicitly to do so: There is a UTF-8 byte stream in the HTML resource delivered by the server, and the HTML resource (or its headers) says that it is encoded as UTF-8. It simply does not matter that the software generating the page accidentally or intentionally wrote that byte stream as what it thought to be ISO-8859-1 characters.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)