The problem is that it's a very large application so to break out anything self-contained is not possible.

What I did notice is that everything was working just fine, my Template Toolkit templates have BOMs, my DB is all UTF8 encoded, my charsets were perfect.

Everything worked great, probably because PERL was doing the right thing, but don't forget there's SIX places for UTF8 to get messed up:
1) Template encoding
2) HTTP headers
3) HTML headers
4) DB encoding
5) DB handle
6) The language itself

That's suddenly a lot of room for forgetting one detail that throws everything else off.

With a small change to the application the internal "guessing" of Perl was suddenly wrong 50% of the time, and the only way to fix it was to Encode EVERY piece of data coming from the database. But not only does it have to be Encoded, but it has to be checked FIRST, because if you don't then Encode.pm spews warnings like an 18 year old after a bottle of Jack Daniels.

And what I believe is happening is that for 90% of the people out there working with UTF8 the "guessing" that Perl does works most of the time, but the problem remains that it seems that the only way to be certain is to encode/decode all input and output and that's just not the way things should work, 10% of my programming should not have to be worrying about this issue.

Tosh

In reply to Re^3: Mugged by UTF8, this CANNOT be right by tosh
in thread Mugged by UTF8, this CANNOT be right by tosh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.