I'm constantly running into troubles while making my web apps Unicode safe.

My usual work flow is

I believe this is the only sane approach, otherwise you lose track which string is a text string and which is not.

The problems start when I put non-ASCII characters into the templates. I looked through the documentation of HTML::Template (I submitted a patch for this one), HTML::Template::Compiled (tinita says she's working on it), Template, Text::Template and Template::Simple, and none of them even mention encoding issues (I search for 'encoding', 'charset', 'utf8', 'utf-8' in the docs).

The problem is that when I write non-ASCII characters into the template files, and the template engine doesn't decode that into text strings and I supply text strings to populate the templates, I have mixed text and binary strings.

Now comes my question: Which template system provides sane handling of encodings? For me that's a good reason to switch to such a module.

My idea of "sane" is something along these lines: On opening the template files I can specify an encoding in which I want the file to be opened, and the template engine handles everything as text strings internally. Any other notion of what "sane" could mean is greatly appreciated as well.

Rant: are encoding issues really that rare and unimportant so that 4* out of 5 modules that I've looked at don't seem to care about it?

Is this a cultural issue? I could imagine that people who's native language can fully be expressed in ASCII characters tend not to care too much about charset.

(*) Regarding Template::Toolkit: a friend told me it had the ENCODING option, but I couldn't find it in the docs, for me this is equivalent to "it doesn't exist".

Grepping through the TT tarball from CPAN I found the note The ENCODING options needs testing and documenting. in the TODO file, so at least there is some awareness.

Update 1: small formatting updates.

In reply to Handling Encoding in Templates by moritz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.