So, I know it's all confusing. Took me forever. But it's actually really simple. A string of bytes is nothing. It's just binary data. You have to know what it's supposed to be and tell your code when coming from binary and going back to it. The raw stuff doesn't know (well, some charsets do have BOM flags but it's not something on which you can rely here). Your DBI/DBD driver can do the encode/decode two-step for you automatically as I suggested (might work even if table definition is wrong but it's best to ensure it's in agreement). :P Examples of the setting to check include–
- DBD::mysql -> mysql_enable_utf8
- This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off.
- DBD::SQLite -> sqlite_unicode
- If the attribute $dbh->{sqlite_unicode} is set, strings coming from the database and passed to the collation function will be properly tagged with the utf8 flag; but this only works if the attribute is set before the first call to a perl collation sequence . The recommended way to activate unicode is to set the sqlite_unicode parameter at connection time
- DBD::Pg -> pg_enable_utf8 (integer)
- DBD::Pg specific attribute. The behavior of DBD::Pg with regards to this flag has changed as of version 3.0.0. The default value for this attribute, -1, indicates that the internal Perl utf8 flag will be turned on for all strings coming back from the database if the client_encoding is set to 'UTF8'. Use of this default is highly encouraged. If your code was previously using pg_enable_utf8, you can probably remove mention of it entirely.
:\
Update: s/simply/simple/;