tosh has asked for the wisdom of the Perl Monks concerning the following question:

Is it too simplistic of me to expect XML::Parser to have a NOPARSE option, like this:
NOPARSE => ['a', 'b', 'i', 'img']
Then the parser wouldn't parse the <a>, <b>, <i>, <img> tags it found in an XML string.

Would that break what was being returned? Is there a way around this? Is it stupid of me to expect HTML in my XML?

Thanks!!!

Tosh

Replies are listed 'Best First'.
Re: Isn't there a NOPARSE option for XML::Parser ?
by grantm (Parson) on Sep 09, 2002 at 00:05 UTC

    Is it stupid of me to expect HTML in my XML?

    I suspect you wouldn't like my first response to that question :-) so I'll move rapidly on to the second...

    The sort of thing you describe would not be XML - end of story. This does not mean that the thing you describe would have no value, simply that it would not be XML and therefore you could not use the myriad XML tools (eg: XML::Parser) to work with it.

    On a more helpful note, you can embed a chunk of HTML in XML as a CDATA section like this:

    <doc> <title>This is a test</title> <htmlstring><![CDATA[ <p>Here is some HTML<br> It has an IMG tag: <img src="logo.png"> and a BR tag<br> </p> ]]></htmlstring> </doc>

    Edit: Oh and I also meant to mention that the XML::LibXML module has the ability to parse HTML so you might find that you can use it to work with your hybrid documents in an XMLish way.