blitzkrieg has asked for the wisdom of the Perl Monks concerning the following question:

I can't seem to get XML::Twig to properly output an element that contains HTML tags. I'm trying to generate an XML document as input into another program that doesn't like CDATA elements.
Input: <pre>won't escape properly - uggh!</pre> Desired Output: &lt;pre&gt;won't escape properly - uggh!&lt;/pre&gt; Actual Output: &lt;pre>won't escape properly - uggh!&lt;/pre>; or &amp;lt;pre&gt;won't escape properly - uggh!&amp;lt;/pre&gt;
The code snippet below produces the latter results and you can generate the former by commenting the output_text_filter line:
use XML::Twig; my $tTwig = XML::Twig->new( pretty_print => 'indented', output_text_filter => 'html', ); $tTwig->set_xml_version( '1.0' ); $tTwig->set_output_encoding( 'utf-8' ); my $tLog = XML::Twig::Elt->new( 'log', q(<pre>won't escape properly - +uggh!</pre>) ); $tTwig->set_root( $tLog ); $tTwig->print;
blitzkrieg

Replies are listed 'Best First'.
Re: XML::Twig and HTML Entities
by bpphillips (Friar) on Nov 12, 2004 at 19:23 UTC
    maybe I'm missing something but the output looks exactly as I would expect... Can you post an example of what you'd prefer XML::Twig to output?
      Here's what the code snippet actually generates:
      <?xml version="1.0" encoding="utf-8"?> <log>&amp;lt;pre&gt;won't escape properly - uggh!&amp;lt;/pre&gt;</log +>
      Here's what I desire?
      <?xml version="1.0" encoding="utf-8"?> <log>&lt;pre&gt;won't escape properly - uggh!&lt;/pre&gt;</log>
      Notice that it does an encode_entities() on the '&' in both of the '&lt;' series. I either want XML::Twig to encode everything or nothing but not a partial encode.
        Why not? Without an opening '<' a '>' is a normal char and doesn't need to encode().
Re: XML::Twig and HTML Entities
by bpphillips (Friar) on Nov 12, 2004 at 20:20 UTC
    This is valid XML (which is generated when you don't have the output_text_filter=>'html' option set:
    <log>&lt;pre>won't escape properly - uggh!&lt;/pre></log>
    The XML spec says that > characters don't need to be escaped. Consequently, when you use the output_text_filter=>'html' it takes the valid XML and encodes &lt; into &amp;lt; and also encodes > into &gt;.