in reply to Re^3: UTF-8 webpage output from MySQL
in thread UTF-8 webpage output from MySQL

That's very hard to guess without seeing your code.

Since you have a binmode STDOUT, ':utf8;' somewhere, you don't need to encode the template's output anymore. Chances are that you don't need to encode anything at all.

The next debugging step is: check the data from the database. Do these strings have the UTF8 flag set? (remeber Devel::Peek.). You can also check the codepoints to see if the data arrived correctly.

Check the same thing for the tempalte's output.

Also make sure that you have warnings enabled, and check your error.log for warnings.

Replies are listed 'Best First'.
Re^5: UTF-8 webpage output from MySQL
by boboson (Monk) on Jan 23, 2008 at 14:27 UTC

    I ran some examples with Devel::Peek and this is my result:

    I try to output: johan from the database with DBI

    SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8f68b08 "Johan"\0 CUR = 5 LEN = 8

    I try to output: Törjebjöåärne from the database with DBI
    The UTF-8 flag is not set!

    SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8164d28 "T\303\266rjebj\303\266\303\245\303\244rne"\0 CUR = 17 LEN = 20

    I try to output: testar from template

    SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x8ecbef0 "testar\n"\0 CUR = 7 LEN = 8

    I try to output: testaråäöÅÄÖ from template
    UTF-8 flag is set but I still get strange chars

    SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8eca838 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28
      Ok, so now we know that you have do decode the return values from DBI.

      And we know that your template isn't set up correctly.

      This works for me:

      #!/usr/bin/perl use strict; use warnings; use Template::Alloy; use Devel::Peek; binmode STDOUT, ':utf8'; my $t = Template::Alloy->new( filename => "utf8test", ENCODING => 'UTF-8', ); Dump $t->output; print $t->output; __END__ file utf8test: testaråäöÅÄÖ ============== output: SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x82d5ea0 "testar\303\245\303\244\303\266\303\205\303\204\303\2 +26\n"\0 [UTF8 "testar\x{e5}\x{e4}\x{f6}\x{c5}\x{c4}\x{d6}\n"] CUR = 19 LEN = 20 testaråäöÅÄÖ

      And this what I get when I store the file utf8test is latin1, and run the script again:

      SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8332cc0 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28 testar������

      Strangely similar to your output, isn't it?

      So it seems taht your template file is not in utf-8, and therefore all attempts to read it as utf-8 result in the \X{fffd} "replacement character".

      So either recode your templates to utf-8 (future-proof) or read them with the right ENCODING option (presumably latin1).

        I got the database data to display correctly in browsers after decoding the data:

        use Encode; decode_utf8($db_data);

        But, is this really the way to go? It feels like tha data in the database is still latin1 and that I have to do something with the data instead?

        I ran some test suggested elesewhere:

        #1 - USE MySQL CHAR_LENGTH TO FIND ROWS WITH MULTI-BYTE CHARACTERS: SELECT CLUB_NAME FROM SUME_CLUB_TMP WHERE LENGTH( CLUB_NAME ) != CHAR_LENGTH( CLUB_NAME ) Result -------------- Törstar #2 - USE MySQL HEX and Perl bin2hex SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP Database --------- törstar 74C3B67273746172 Perl bin2hex --------- 74f67273746172 törstar #3 - SEE IT IN BOTH ENCODINGS SET CLUB_NAME latin1; SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP; A databasecall in perl - utf8 --------- törstar 74C3B67273746172 A databasecall in perl - latin1 --------- törstar 74C3B67273746172

        Now, finally! My templates works, thansk to your input! It was a tedious task of converting my templates to UTF-8. I thought I could just open my latin1 templates and save them in UTF-8 no BOM with for example Ultra Edit, but that didn't work. I had to create completly new files and copy and paste the code into these new templates.

        Now it's just the Database that is given me an headache. I will try to decode the data somehow. I'll get back with my results.