in reply to Re^2: UTF-8 webpage output from MySQL
in thread UTF-8 webpage output from MySQL

Ok, I did an override of the load_tmpl function like this:

sub load_tmpl { my $self = shift; my ($tmpl_file, @extra_params) = @_; # add tmpl_path to path array if one is set, otherwise add a path +arg if (my $tmpl_path = $self->tmpl_path) { my @tmpl_paths = (ref $tmpl_path eq 'ARRAY') ? @$tmpl_path : $ +tmpl_path; my $found = 0; for( my $x = 0; $x < @extra_params; $x += 2 ) { if ($extra_params[$x] eq 'path' and ref $extra_params[$x+1] eq 'ARRAY') { unshift @{$extra_params[$x+1]}, @tmpl_paths; $found = 1; last; } } push(@extra_params, path => [ @tmpl_paths ]) unless $found; } my %tmpl_params = (); my %ht_params = @extra_params; %ht_params = () unless keys %ht_params; # Define our extension if doesn't already exist; $self->{__CURRENT_TMPL_EXTENSION} = '.html' unless defined $self-> +{__CURRENT_TMPL_EXTENSION}; # Define a default templat name based on the current run mode unless (defined $tmpl_file) { $tmpl_file = $self->get_current_runmode . $self->{__CURRENT_TM +PL_EXTENSION}; } $self->call_hook('load_tmpl', \%ht_params, \%tmpl_params, $tmpl_fi +le); #require HTML::Template; # let's check $tmpl_file and see what kind of parameter it is - we # now support 3 options: scalar (filename), ref to scalar (the # actual html/template content) and reference to FILEHANDLE #my $t = undef; #if ( ref $tmpl_file eq 'SCALAR' ) { # $t = HTML::Template->new_scalar_ref( $tmpl_file, %ht_params ) +; #} elsif ( ref $tmpl_file eq 'GLOB' ) { # $t = HTML::Template->new_filehandle( $tmpl_file, %ht_params ) +; #} else { # $t = HTML::Template->new_file($tmpl_file, %ht_params); #} require Template::Alloy; my $t = undef; if ( ref $tmpl_file eq 'SCALAR' ) { $t = Template::Alloy->new(type => 'filename', source => + $tmpl_file, %ht_params, ENCODING => 'UTF-8'); } elsif ( ref $tmpl_file eq 'GLOB' ) { $t = Template::Alloy->new(type => 'scalarref', source => $t +mpl_file, %ht_params, ENCODING => 'UTF-8'); } else { $t = Template::Alloy->new(type => 'filename', source => $tmpl_ +file, %ht_params, ENCODING => 'UTF-8'); } if (keys %tmpl_params) { $t->param(%tmpl_params); } return $t; }

It works, but I still get strange characters. My template files displays the strange question mark symbol (firefox), square emtpy box (IE) instead of å, ä and ö.

&#65533;ndra ditt l&#65533;senord

The data from my database displays as before

Törjebjöåärne

I have tried to save my template files as UTF-8 without BOM, without any success. Am I doing some kind of double encoding? Why does it give me different characters, sometimes an ö will give me &#65533; and sometimes ö

Replies are listed 'Best First'.
Re^4: UTF-8 webpage output from MySQL
by moritz (Cardinal) on Jan 23, 2008 at 10:34 UTC
    That's very hard to guess without seeing your code.

    Since you have a binmode STDOUT, ':utf8;' somewhere, you don't need to encode the template's output anymore. Chances are that you don't need to encode anything at all.

    The next debugging step is: check the data from the database. Do these strings have the UTF8 flag set? (remeber Devel::Peek.). You can also check the codepoints to see if the data arrived correctly.

    Check the same thing for the tempalte's output.

    Also make sure that you have warnings enabled, and check your error.log for warnings.

      I ran some examples with Devel::Peek and this is my result:

      I try to output: johan from the database with DBI

      SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8f68b08 "Johan"\0 CUR = 5 LEN = 8

      I try to output: Törjebjöåärne from the database with DBI
      The UTF-8 flag is not set!

      SV = PV(0x8e4fe98) at 0x8cdc584 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x8164d28 "T\303\266rjebj\303\266\303\245\303\244rne"\0 CUR = 17 LEN = 20

      I try to output: testar from template

      SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x8ecbef0 "testar\n"\0 CUR = 7 LEN = 8

      I try to output: testaråäöÅÄÖ from template
      UTF-8 flag is set but I still get strange chars

      SV = PV(0x8e43e5c) at 0x8e46da4 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8eca838 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28
        Ok, so now we know that you have do decode the return values from DBI.

        And we know that your template isn't set up correctly.

        This works for me:

        #!/usr/bin/perl use strict; use warnings; use Template::Alloy; use Devel::Peek; binmode STDOUT, ':utf8'; my $t = Template::Alloy->new( filename => "utf8test", ENCODING => 'UTF-8', ); Dump $t->output; print $t->output; __END__ file utf8test: testaråäöÅÄÖ ============== output: SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x82d5ea0 "testar\303\245\303\244\303\266\303\205\303\204\303\2 +26\n"\0 [UTF8 "testar\x{e5}\x{e4}\x{f6}\x{c5}\x{c4}\x{d6}\n"] CUR = 19 LEN = 20 testaråäöÅÄÖ

        And this what I get when I store the file utf8test is latin1, and run the script again:

        SV = PV(0x825c260) at 0x82d629c REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x8332cc0 "testar\357\277\275\357\277\275\357\277\275\357\277\2 +75\357\277\275\357\277\275\n"\0 [UTF8 "testar\x{fffd}\x{fffd}\x{fffd} +\x{fffd}\x{fffd}\x{fffd}\n"] CUR = 25 LEN = 28 testar&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;

        Strangely similar to your output, isn't it?

        So it seems taht your template file is not in utf-8, and therefore all attempts to read it as utf-8 result in the \X{fffd} "replacement character".

        So either recode your templates to utf-8 (future-proof) or read them with the right ENCODING option (presumably latin1).