seuratt has asked for the wisdom of the Perl Monks concerning the following question:

At work, we've got the legacy problem that our servers and content are latin-1 centric. I've written an application that uses Twig as a data store, with the encoding set at latin-1. This seems to work amazingly (thanks mirod) and any utf8 or html characters that might have snuck in are escaped. This ut8 to latin/html-escape character conversion is exactly what I want to do to my data before sending it to the browser.

The problem is that Twig dutifully undoes this escaping unless I turn keep_encoding on. That option, as warned in the docs, doesn't work so well. It leaves html escaped and seems to double escaped the unicode.

The documentation of Twig explains that, even with output_encoding set, the values of any text() or attr() functions are utf8. I have considered converting at every single element access, but this is maybe 50 or so points in my code and makes these dense functions denser.

I've considered just rendering my browser page and then applying a convert to the entire string, but it is even harder to do once the utf8 variables are substituted into latin-1 templates. So I guess my question is: does anyone know of an easier way? I'll likely end up going through and doing that convert at every call to Twig::Elt->text() and attr(), but I'd really rather not.

Replies are listed 'Best First'.
Re: XML, Twig, and character encoding
by mirod (Canon) on Mar 09, 2005 at 17:00 UTC

    Could you just subclass XML::Twig::Elt and use the elt_class to have XML::Twig use the new class? The new class would use base 'XML::Twig::Elt'; and simply redefine text and attr methods with the conversion added there.

    Also, for elements, if they do not include nested elements, maybe you could use the xml_string method, which strips out the outter tag and gives you the text in the output_encoding. This would not work for attributes though.

    If this does not give you enough information to solve the problem, post a bit of code that shows your problem, along with the input and expected output. Thanks