At work, we've got the legacy problem that our servers and content are latin-1 centric. I've written an application that uses Twig as a data store, with the encoding set at latin-1. This seems to work amazingly (thanks mirod) and any utf8 or html characters that might have snuck in are escaped. This ut8 to latin/html-escape character conversion is exactly what I want to do to my data before sending it to the browser.

The problem is that Twig dutifully undoes this escaping unless I turn keep_encoding on. That option, as warned in the docs, doesn't work so well. It leaves html escaped and seems to double escaped the unicode.

The documentation of Twig explains that, even with output_encoding set, the values of any text() or attr() functions are utf8. I have considered converting at every single element access, but this is maybe 50 or so points in my code and makes these dense functions denser.

I've considered just rendering my browser page and then applying a convert to the entire string, but it is even harder to do once the utf8 variables are substituted into latin-1 templates. So I guess my question is: does anyone know of an easier way? I'll likely end up going through and doing that convert at every call to Twig::Elt->text() and attr(), but I'd really rather not.


In reply to XML, Twig, and character encoding by seuratt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.