in reply to Bug in Template?

Hello packetstormer

You said
> (using the same dbh call with UTF8 enabled)
This may be "internally decoded perl's utf8". So encode it to external UTF8 before you pass them to Tempalte.

#!/usr/bin/perl use strict; use warnings; use Encode qw(encode decode); use Template; my @chars_not_encoded=(); my @chars_encoded=(); #foreach my $code ( hex('3041') .. hex('3096') ){ foreach my $code ( hex('00C0') .. hex('00F0') ){ push @chars_not_encoded, chr($code); push @chars_encoded, encode('utf8', chr($code)) ; }; my $t =Template->new(); #corrupt output $t->process("test.tmpl", {lines=>\@chars_not_encoded}, "log_noenc" ) o +r die $t->error(); #OK $t->process("test.tmpl", {lines=>\@chars_encoded}, "log_enc" ) or die +$t->error();
And template
<html> <head> <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8" +> </head> <body> [% FOREACH item IN lines %] item=#[% item %]#<br> [% END %] </body> </html>

I am also confusing about encoding of Template, And there seems a lot to read for theses troubles(for example Template::Provider::Encoding )... good luck

Replies are listed 'Best First'.
Re^2: Bug in Template?
by remiah (Hermit) on Mar 22, 2012 at 00:53 UTC
    Oh ... it seems I am totally confused.
    I 'll post later when I clear my mind.

      This seems not a problem of Template. I also want advice for this.

      “Séan”'s é may be 00E9 of unicode table http://www.utf8-chartable.de/unicode-utf8-table.pl. I thought decode it to perl internal utf8 and pass them to Template encoding it utf8 will work. But it is not work. Without Template, there is strange behavior.

      #!/usr/bin/perl use strict; use warnings; use Encode qw(is_utf8 encode decode); use Template; my(@raw, @decoded_internal_utf8,@encoded_raw_utf8,@encoded_internal_ut +f8); my @chars=hex('00C0') .. hex('00F0'); #target characters #my @chars=hex('3041') .. hex('3096'); #hiragana foreach my $code ( @chars ){ my($raw, $chr); $raw =chr($code); if ( is_utf8($raw) ){ $chr=$raw; } else { $chr=decode('utf8',$raw); } push @raw, $raw; push @decoded_internal_utf8, $chr; push @encoded_raw_utf8 , encode('utf8', $raw); push @encoded_internal_utf8, encode('utf8', $chr); } print "======================\n"; print "perl=$^X : version=$]\n"; print "1.###raw\n"; print "#$_#\n" for @raw; print "2.###decoded_intenal_utf8\n"; #print "#$_#\n" for @decoded_internal_utf8; print "3.###encoded_raw_utf8\n"; print "#$_#\n" for @encoded_raw_utf8; print "4.###encoded_internal_utf8\n"; print "#$_#\n" for @encoded_internal_utf8;
      It is strange No3 only works at this case. I usualy print characters with No 4. Japanese characters like "hiragana" seems to have no problem( for example,'3041' .. '3096').

      I saw similar problem at Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?. At that time, I didn't understand well and thought newer version would have no problem... Is this the same trouble? I tried with 5.012002 and 5.014002. They print exact same output except version number.

        I'm confused by your code, what is it supposed to demonstrate? perlunitut: Unicode in Perl warns against using is_utf8, so I wouldn't use it

        Consider

        $ perl -le " print chr hex q/C0/ " | od -tx1 0000000 c0 0d 0a 0000003
        when viewed as Windows-1252 it is À

        And this

        $ perl -le " binmode STDOUT , q/:utf8/; print chr hex q/C0/ " | od -tx +1 0000000 c3 80 0d 0a 0000004
        when viewed as Windows-1252 it is À but viewed as UTF-8 it is 
        And this

        $ perl -MEncode -le " print decode(q/utf8/, chr hex q/C0/ )" | od -tx1 Wide character in print at -e line 1. 0000000 ef bf bd 0d 0a 0000005
        when viewed as Windows-1252 it is � but viewed as UTF-8 it is �

        If you search for ef bf bd you'll see lots of questions about this erroneous conversion

        So if you want to treat chr 192 (  perl -le " print  hex q/C0/ " ) as unicode you have to encode it, because characters 0 to 255 are also valid Latin-1, they are not utf8

        $ perl -le " print chr hex q/C0/ " |od -tx1 0000000 c0 0d 0a 0000003 $ perl -le " print chr 255 " |od -tx1 0000000 ff 0d 0a 0000003 $ perl -le " print chr 256 " |od -tx1 Wide character in print at -e line 1. 0000000 c4 80 0d 0a 0000004

        Or, if you want chr 192 to return unicode, use encoding pragma ( utf8 pragma doesn't affect chr )

        $ perl -le " use encoding q/utf8/; print chr 192 " |od -tx1 0000000 c3 80 0a 0000003