in reply to Re^4: UTF-8 webpage output from MySQL
in thread UTF-8 webpage output from MySQL

I ran some examples with Devel::Peek and this is my result:

I try to output: johan from the database with DBI

SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8f68b08 "Johan"\0 CUR = 5 LEN = 8

I try to output: Törjebjöåärne from the database with DBI
The UTF-8 flag is not set!

SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8164d28 "T\303\266rjebj\303\266\303\245\303\244rne"\0 CUR = 17 LEN = 20

I try to output: testar from template

SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x8ecbef0 "testar\n"\0 CUR = 7 LEN = 8

I try to output: testaråäöÅÄÖ from template
UTF-8 flag is set but I still get strange chars

SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8eca838 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28

Replies are listed 'Best First'.
Re^6: UTF-8 webpage output from MySQL
by moritz (Cardinal) on Jan 23, 2008 at 15:21 UTC
    Ok, so now we know that you have do decode the return values from DBI.

    And we know that your template isn't set up correctly.

    This works for me:

    #!/usr/bin/perl use strict; use warnings; use Template::Alloy; use Devel::Peek; binmode STDOUT, ':utf8'; my $t = Template::Alloy->new( filename => "utf8test", ENCODING => 'UTF-8', ); Dump $t->output; print $t->output; __END__ file utf8test: testaråäöÅÄÖ ============== output: SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x82d5ea0 "testar\303\245\303\244\303\266\303\205\303\204\303\2 +26\n"\0 [UTF8 "testar\x{e5}\x{e4}\x{f6}\x{c5}\x{c4}\x{d6}\n"] CUR = 19 LEN = 20 testaråäöÅÄÖ

    And this what I get when I store the file utf8test is latin1, and run the script again:

    SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8332cc0 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28 testar������

    Strangely similar to your output, isn't it?

    So it seems taht your template file is not in utf-8, and therefore all attempts to read it as utf-8 result in the \X{fffd} "replacement character".

    So either recode your templates to utf-8 (future-proof) or read them with the right ENCODING option (presumably latin1).

      Now, finally! My templates works, thansk to your input! It was a tedious task of converting my templates to UTF-8. I thought I could just open my latin1 templates and save them in UTF-8 no BOM with for example Ultra Edit, but that didn't work. I had to create completly new files and copy and paste the code into these new templates.

      Now it's just the Database that is given me an headache. I will try to decode the data somehow. I'll get back with my results.

      I got the database data to display correctly in browsers after decoding the data:

      use Encode; decode_utf8($db_data);

      But, is this really the way to go? It feels like tha data in the database is still latin1 and that I have to do something with the data instead?

      I ran some test suggested elesewhere:

      #1 - USE MySQL CHAR_LENGTH TO FIND ROWS WITH MULTI-BYTE CHARACTERS: SELECT CLUB_NAME FROM SUME_CLUB_TMP WHERE LENGTH( CLUB_NAME ) != CHAR_LENGTH( CLUB_NAME ) Result -------------- Törstar #2 - USE MySQL HEX and Perl bin2hex SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP Database --------- törstar 74C3B67273746172 Perl bin2hex --------- 74f67273746172 törstar #3 - SEE IT IN BOTH ENCODINGS SET CLUB_NAME latin1; SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP; A databasecall in perl - utf8 --------- törstar 74C3B67273746172 A databasecall in perl - latin1 --------- törstar 74C3B67273746172