bpaulsen has asked for the wisdom of the Perl Monks concerning the following question:

There must be an easy way to do this, but I can't figure it out... I have characters like '>' and '"' in my XML and when I try to get the text from XML::Twig, it doesn't unescape those characters. This seems like something that everybody would want to happen for them, so how do I get them unescaped? Here's my sample code and results:
#!/usr/local/bin/perl5 -w use XML::Twig; my $twig = new XML::Twig( TwigRoots => { "LINK" => 1 } ); $twig->parse( "<?xml version=\"1.0\"?><TEST><LINK><TITLE>This &lt; is +a &gt; &quot;test&apos; &amp; don&apos;t you forget it</TITLE></LINK> +</TEST>" ); my $root = $twig->root; my @links = $root->children; print $links[0]->first_child_text( "TITLE" ), "\n"; Results: This &lt; is a &gt; &quot;test&apos; &amp; don&apos;t you forget it

Replies are listed 'Best First'.
Re: Unescape characters from XML::Twig
by mirod (Canon) on Apr 20, 2001 at 20:03 UTC

    My mistake! When I first wrote XML::Twig I only used it to output back XML. Outputting HTML or plain text came later. So there you have a bug!

    The next version (XML::Twig 3.0) will fix this. I will properlly store the ' as... ' and only turn it into &apos; when using the print or sprint methods, the text one will leave it as '

    In the meantime you can unescape the text using this:

    sub unescape { my $text= shift; $text=~ s/&lt;/>/g; $text=~ s/&gt;/</g; $text=~ s/&quot;/"/g; $text=~ s/&apos;/'/g; $text=~ s/&amp;/&/; return $text; }

    By the way, if you have tag-heavy XML you can also use CDATA sections: if you replace your string by the following one it will work (nearly, there is a definite bug there, as opposed to a poor design decision, which turns the & into &amp;):

     qq{<?xml version=\"1.0\"?><TEST><LINK><TITLE><![CDATA[This < is a > " test ' & don't you forget it]]></TITLE></LINK></TEST>}
      Minor correction, the line:
      $text=~ s/&amp;/&/;
      should be:
      $text=~ s/&amp;/&/g;.
Re: Unescape characters from XML::Twig
by arturo (Vicar) on Apr 20, 2001 at 19:39 UTC

    You can use the decode_entities function from HTML::Entities to do this if there's no method in XML::Twig. But I believe a /msg to XML::Twig's author might be in order if you don't find anything in the documentation or the source code to the module.

    HTH