Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Question for my dear PerlMonks, I have an xml file that I am parsing. One of the items in the xml is a 'content' value that has HTML. I am using SimpleXML to parse the file and XMLin like so:
$xml= eval { $data->XMLin($webpage, forcearray => 1, suppressempty=> +'') };
when I dump the hash I am discovering that SimpleXML is parsing the HTML into the hash. This is not what I want. I want to just grab content inside of it. How do I do this? Thank you!

Replies are listed 'Best First'.
Re: Simplexml parsing html in xml entity in hash
by ikegami (Patriarch) on Apr 14, 2010 at 20:58 UTC

    The element's value isn't HTML (or XHTML).

    <?xml version="1.0"?> <root> <element> &lt;p&gt;Rock &amp;amp; Roll&lt;/p&gt; </element> </root>

    Rather, the element has XHTML children.

    <?xml version="1.0"?> <root xmlns:html="http://www.w3.org/1999/xhtml"> <element> <html:p>Rock &amp; Roll</html:p> </element> </root>

    Faced with the latter, you'd normally just ask for the XML of the node (->toString() or some such). However, that won't work with XML::Simple because its parser is too lossy to handle typical XHTML.