comment on

The only reason you need to use that encoding is because CGI::FastTemplate doesn't do what it should do with the UTF-8 encoded templates. If you want to use multi-byte encoded text files - and this includes templates - the files should be using the correct IO layer, which you can set via binmode.

The only reason use utf8 / use encoding "utf8" fixes the problem is because eval STRING will then assume any string passed to it is utf-8 encoded even if the string that's eval()d isn't marked as utf-8. That is arguably not even correct behaviour - I would definitely argue it's a bug.

#!/usr/bin/perl -w
use strict;

use utf8;
my $str1 = <DATA>;              # not utf8
my $str2 = eval "'".<DATA>."'"; # utf8

no utf8;
my $str3 = <DATA>;              # not utf8
my $str4 = eval "'".<DATA>."'"; # not utf8

binmode(DATA,":utf8");          # THIS is what you should do.
my $str5 = <DATA>;              # utf8 

print "str1 is",utf8::is_utf8($str1) ? "" :" not"," utf8\n";
print "str2 is",utf8::is_utf8($str2) ? "" :" not"," utf8\n";
print "str3 is",utf8::is_utf8($str3) ? "" :" not"," utf8\n";
print "str4 is",utf8::is_utf8($str4) ? "" :" not"," utf8\n";
print "str5 is",utf8::is_utf8($str5) ? "" :" not"," utf8\n";
__DATA__
État
État
État
État
[download]

update: all the above is interesting, but not correct since the CGI::FastTemplate documentation claims it doesn't use eval().

updat2: I still think there's NEVER any reason to use both utf8 and encoding "utf8" at the same time. They are more or less equivalent anyway. Please try using either one or the other and see if that fixes your problem. Using both does cause problems, as you can see.

Also, if at all possible, the best solution would be to patch CGI::FastTemplate to open the template files with the correct IO layer instead of relying on this ugly hack - (maybe you can open the template files yourself, set the binmode and pass the filehandle/filecontent to CGI::FastTemplate yourself? that might also fix the issue)

"What should it profit a man, if he should win a flame war, yet lose his cool?"

In reply to Re^5: utf8, locale and regexp by Joost
in thread utf8, locale and regexp by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.