former33t has asked for the wisdom of the Perl Monks concerning the following question:

I have to parse XML formatted messages being sent across a network connection and prefer to use XPath. I thought I was being clever by using code like the below to avoid a fatal error in the network server while parsing the messages:

use XML::XPath; my ($xml, $xp); $xml = “not xml”; eval { $xp = XML::XPath->new(xml => $xml); }; if ($@) { print “Parsing failed\n”; } …..

The problem is that apparently XML::XPath doesn’t check for valid XML on object creation. It returns the object fine, which is far from ideal for my needs. I don’t think anyone will ever send me invalid XML since I wrote the client apps too, but I don’t want to bet a midnight call in on it either. The solution involves also using XML::Parser and letting it do the parsing. If everything is okay, then we create an XPath object.

use XML::XPath; use XML::Parser; my ($xml, $xp); $xml = “not xml”; eval { my $po = new XML::Parser; $po->parse($xml); }; if ($@) { print “Parsing failed\n”; #handle error } else { $xp = XML::XPath->new(xml => $xml); } …..

The XML::XPath documentation says that the constructor will take a parser argument that is an XML::Parser object but this seems to work just fine too (although probably has higher overhead). What would be ideal is if XML::XPath just did XML validation before returning the object.

Replies are listed 'Best First'.
Re: XML::XPath doesn't validate XML???
by jettero (Monsignor) on May 19, 2007 at 20:09 UTC

    People sometimes use the XPath module to parse regular old html though. That'd be fairly impossible for random web pages off the net if it wouldn't let you do it without validating.

    I think having to use a an XML parser to validate makes sense really, since XPath is about finding the stuff and parsers are about validating... I could be picturing it wrong, but that makes sense to me.

    -Paul

Re: XML::XPath doesn't validate XML???
by Cody Pendant (Prior) on May 20, 2007 at 03:33 UTC
    I think jettoro's right, an XSL application isn't necessarily required to validate the XML (or other) document it's being asked to transform.

    If you ask your parser to return the nodeset

    //foo/bar[@baz='bof']
    and you ask it to look for that nodeset in the string "not xml", it's going to return an empty nodeset. Which is the right answer.

    But if you ask it to find the nodeset matching

    //foo/bar[@baz='bof]'
    it will give you something in $@, because that's bad XSL.


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
      Okay, I can see the use case for that. It just seemed intuitive that that if the object returned then the XML was valid. The error I get when I search for a node that isn't there is from XML::Parser which XML::XPath inherets from.