in reply to UTF-8 webpage output from MySQL

The normal workflow is:
  1. Decode all incoming data (with Encode::decode or with an IO layer)
  2. Work with your data
  3. Encode all outgoing data (with Encode::encode or an IO layer)

The problem is that HTML::Template doesn't support step one - it always reads templates as binary data. Once you mix that with decoded data, you're lost.

One solution is to use HTML::Template::Compiled, which is a drop-in replacement for HTML::Template, and which has the open_mode option to new - just create your templates with

use HTML::Template::Compiled; my $t = HTML::Template::Compiled->new( filename => 'mytemplate.phtml', open_mode => '<:encoding(UTF-8)', );

Another "solution" is to encode every string that is passed to HTML::Template, but this will make your code explode (in terms of size, anyway).

Update: a few debugging tips when dealing with charset issues:

2nd update: you might have confused "encode" and "decode" - you have to decode input data from the outside (Foreign data -> Perl text strings) and you have to encode data in the other direction (Perl text strings -> Rest of World).

Replies are listed 'Best First'.
Re^2: UTF-8 webpage output from MySQL
by boboson (Monk) on Jan 22, 2008 at 12:24 UTC
    I thought these lines in my CGI::Application baseclass took care of the input, output encoding
    binmode STDIN, ":encoding(utf8)"; binmode STDOUT, ":encoding(utf8)";
    and that my main problem was the data from the database.
      In CGI scripts STDIN is only used to read POST data, so that's not all that interesting.

      But the problems with the templates remain - as long as you use HTML::Template, you'll have to be very careful not to mix binary and text strings. So if you don't want to waste your sanity on charset issues, you should really switch to a template system that is aware to character encodings.

      And since your templates have HTML::Template syntax I recommend one of the drop-in replacements, that is HTML::Template::Compiled or Template::Alloy.

      The line decode_utf8($tmpl->output); in the OP demonstrates that you decode the template's output. So if HTML::Template provides you with binary data, and DBI returns upgraded data (aka text strings), your problem actually occured much earlier.

        I lost my sanity over this a long time age...

        I do not want to switch template system, I can't be the only one that is using H::T with utf-8?

        I don't use the code below:

        decode_utf8($tmpl->output);
        it was just to show that if I did use it, my template files would display the reverse question mark sign instead of å,ä and ö. but my database data would display correct.