in reply to Re^5: UTF-8 webpage output from MySQL
in thread UTF-8 webpage output from MySQL

Ok, so now we know that you have do decode the return values from DBI.

And we know that your template isn't set up correctly.

This works for me:

#!/usr/bin/perl use strict; use warnings; use Template::Alloy; use Devel::Peek; binmode STDOUT, ':utf8'; my $t = Template::Alloy->new( filename => "utf8test", ENCODING => 'UTF-8', ); Dump $t->output; print $t->output; __END__ file utf8test: testaråäöÅÄÖ ============== output: SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x82d5ea0 "testar\303\245\303\244\303\266\303\205\303\204\303\2 +26\n"\0 [UTF8 "testar\x{e5}\x{e4}\x{f6}\x{c5}\x{c4}\x{d6}\n"] CUR = 19 LEN = 20 testaråäöÅÄÖ

And this what I get when I store the file utf8test is latin1, and run the script again:

SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8332cc0 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28 testar������

Strangely similar to your output, isn't it?

So it seems taht your template file is not in utf-8, and therefore all attempts to read it as utf-8 result in the \X{fffd} "replacement character".

So either recode your templates to utf-8 (future-proof) or read them with the right ENCODING option (presumably latin1).

Replies are listed 'Best First'.
Re^7: UTF-8 webpage output from MySQL
by boboson (Monk) on Jan 24, 2008 at 08:35 UTC

    Now, finally! My templates works, thansk to your input! It was a tedious task of converting my templates to UTF-8. I thought I could just open my latin1 templates and save them in UTF-8 no BOM with for example Ultra Edit, but that didn't work. I had to create completly new files and copy and paste the code into these new templates.

    Now it's just the Database that is given me an headache. I will try to decode the data somehow. I'll get back with my results.

Re^7: UTF-8 webpage output from MySQL
by boboson (Monk) on Jan 25, 2008 at 07:42 UTC

    I got the database data to display correctly in browsers after decoding the data:

    use Encode; decode_utf8($db_data);

    But, is this really the way to go? It feels like tha data in the database is still latin1 and that I have to do something with the data instead?

    I ran some test suggested elesewhere:

    #1 - USE MySQL CHAR_LENGTH TO FIND ROWS WITH MULTI-BYTE CHARACTERS: SELECT CLUB_NAME FROM SUME_CLUB_TMP WHERE LENGTH( CLUB_NAME ) != CHAR_LENGTH( CLUB_NAME ) Result -------------- Törstar #2 - USE MySQL HEX and Perl bin2hex SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP Database --------- törstar 74C3B67273746172 Perl bin2hex --------- 74f67273746172 törstar #3 - SEE IT IN BOTH ENCODINGS SET CLUB_NAME latin1; SELECT CLUB_NAME, HEX(CLUB_NAME) FROM SUME_CLUB_TMP; A databasecall in perl - utf8 --------- törstar 74C3B67273746172 A databasecall in perl - latin1 --------- törstar 74C3B67273746172