isync has asked for the wisdom of the Perl Monks concerning the following question:

I am inserting data into a mysql table via perl. As far as my checks go, I do insert proper utf8 octets/flagged variables.

So far so good.

I am ssh'ed to the server, so I open mysql to manually check the result in the table. As expected the SELECT statement returns a listing where all umlauts are properly displayed.

But! When I use phpmyadmin to have a look at the exact same results set, I get ü for ü - although my phpmyadmin is set to utf8, my browser displays utf8 and the database has collation=utf8_general_ci on the table and the field.

Then I try my utf8 properly displaying cgi script and it behaves exactly like phpmyadmin, giving ü for ü.

As it seems something is mixed up with my MySQL input/output streams but I have to few knowledge of this to sort out what is wrong.
Is is possible to have utf8 data in a utf8 table, but the data is outputtet by mysql as latin1?


I found on another thread that posting my mysql environment variables might be of help:
| character_set_client | latin1
| character_set_connection | latin1
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | latin1
| character_set_server | latin1
| character_set_system | utf8
  • Comment on Is it perl or MySQL? ü ends up in CGI script

Replies are listed 'Best First'.
Re: Is it perl or MySQL? ü ends up in CGI script
by moritz (Cardinal) on Feb 08, 2008 at 14:28 UTC
    Is is possible to have utf8 data in a utf8 table, but the data is outputtet by mysql as latin1?

    It should be possible, but IMHO that's not worth the trouble.

    Really, charset issues are ugly to work with, and increasing the ugliness by using a terminal with a different encoding than you work with is just plain insane.

    I tried that for a while, end then I switched everything to UTF-8 - my files, terminal, locales, cgi output, screen, irssi - you name it.

    So my advice is:

    • Get a clean environment with utf-8 locales
    • dump your databases, and delete them
    • Set any charset related configuration variable in mysql to utf-8
    • Import the data, if possible use Encode::Guess to check if the data is utf-8 or latin1. If it is latin1, recode.
    • Have fun!

    mysql's charset handling is rather mysterious to me, and seemingly fragile, so I advise not to fiddle with it in complicated ways. (But if you find somebody who can tell you how to get it to work, you can use that route, of course).

      Now, how do I alter the mysqld environment: All of the following in the my.cnf fail:
      # character-set-client =utf8
      # character-set-connection =utf8
      # character-set-database =utf8
      # set-variable = character_set_filesystem =binary
      # set-variable = character_set_results =utf8
      # set-variable = character_set_server =utf8
      # set-variable = character_set_system =utf8
      # set-variable = character_sets_dir =/usr/share/mysql/charsets/
      # collation-connection =utf8_general_ci
      # collation-database = utf8_general_ci
      # collation-server = utf8_general_ci

      Update: (THREAD SOLVED)
      Ok, let's check your current configuration:
      $ mysql > SHOW VARIABLES;
      You probably see lots of "latin1" for variables with a charset_* prefix. These are common defaults.

      You might want to change this. Compare the manual to understand what we do now.
      You can set the MySQL config variables globally in the my.cnf config file, with:
      [mysqld] ... init_connect="SET collation_connection = utf8_general_ci" init_connect="SET NAMES utf8" default-character-set = utf8 character-set-server = utf8 collation-server = utf8_general_ci [client] default-character-set = utf8 [mysqldump] ... default-character-set = utf8

      Or you can do this on a "per connection" basis.
      For example, when you use DBI and MySQL, set mysql_enable_utf8 to a true value via the extra params passed to the driver upon connection.

      (This thread is marked as being solved, as I found out, most problems similar to the described one here are coming from the MySQL server being in default/latin1 mode while you try to have everything else in your pipeline in utf8. Changing this last element to utf8 solves it.)
      Is is possible to have utf8 data in a utf8 table, but the data is outputtet by mysql as latin1?

      actually, that wasn't my intention. But it seems to be the only explanation for the error - so I wanted to know if this is generally possible...

      btw: I will now do as advised and set all env variables to utf8.
Re: Is it perl or MySQL? ü ends up in CGI script
by TedYoung (Deacon) on Feb 08, 2008 at 15:07 UTC

    For what it is worth, we call this problem "Irritable Vowel Syndrome"

    Ted Young

Re: Is it perl or MySQL? ü ends up in CGI script
by olus (Curate) on Feb 08, 2008 at 14:52 UTC

    Maybe you could try to encode the results from your queries to utf8 in your script.

    ... use Encode qw ( from_to ); ... my @row = $stmt->fetchrow_array(); map{ from_to($_, "latin1","utf8"); } @row; ...
Re: Is it perl or MySQL? ü ends up in CGI script
by oko1 (Deacon) on Feb 08, 2008 at 14:21 UTC
    What happens when you do a simple text SELECT query? That'll separate the MySQL::Men from the HTML::Sheep.
Re: Is it perl or MySQL? ü ends up in CGI script
by isync (Hermit) on Feb 08, 2008 at 16:10 UTC
    Now I configured my environment right:
    | character_set_client | utf8
    | character_set_connection | utf8
    | character_set_database | utf8
    | character_set_filesystem | binary
    | character_set_results | utf8
    | character_set_server | utf8
    | character_set_system | utf8
    | character_sets_dir | /usr/share/mysql/charsets/ |
    | collation_connection | utf8_general_ci
    | collation_database | utf8_general_ci
    | collation_server | utf8_general_ci

    But the problem persists in a new iteration:
    in phpmyadmin: the same ü for ü
    but now, also via console # mysql > SELECT ...
    I get garbled data. So far it seemed a problem of the output only....

    Any ideas?
      Does that happen for clean UTF-8 data that you inserted after changing the configuration?
        Yes, but now it works, as can be seen in my update, two more line of configuration make a difference (altough I don't fully understand, what I did...)