mat21 has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,
I am using XML::LibXML::Reader to parse large XML files and I am trying to use some methods based on XPATH.
There is something (probably simple) that I don't understand.
For instance, I create a pattern
my $reader = new XML::LibXML::Reader(location => $xmlfile) or die "can +not read $xmlfile\n"; while ($reader->read) { my $pattern = XML::LibXML::Pattern->new('//entry'); $reader ->nextPatternMatch($pattern); }
there is no match although many entry tags are in the xml file (same behaviour for any tag).
the first node of the XML file contains information about schema
<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.o +rg/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/un +iprot http://www.uniprot.org/support/docs/uniprot.xsd">
if I replace it by only <uniprot> all my xpath queries work. I guess I have to do something to declare the schema, but the tags do not have any prefix like examples in the documentation
I don't know what to do. Any advice would be welcome.
thanks

Replies are listed 'Best First'.
Re: XML::LibXML::Reader and XPATH
by dHarry (Abbot) on Feb 20, 2009 at 11:06 UTC

    This has nothing to do with Perl and/or libxml, instead it is about XML.

    but the tags do not have any prefix like examples in the documentation

    Well you can add the prefixes, i.e. qualify the tags with the namespace. But the uniprot element declaration looks suspicious to me.

    You might want to spend some time on choosing the right strategy for determining what the default/target namespace should be, see DefaultNamespace.pdf for a discussion. This gives a few examples on how to do it.

    HTH

      Thanks for your answer and the link. it helped me to find another interesting link and it is not so simple XPath vs the default namespace

      I don't think the use of XML::LibXML::XPathContext is compatible XML::LibXML::Reader which is annoying.
      The files I am parsing are really big and I don't want to use DOM or SAX...

        For big files DOM is out of the question though there always tricks of course. With SAX I've parsed big files with good performance (Personally I favor the Xalan and Xerces implementations of Apache). Although I do use libxml2 I invariably use XML::Twig when I am in a Perl environment. I have parsed files over 1 GB with it. There is also XML::Twig::XPath but I never used it. You would have to check if it solves your problem. The document I mentioned is only one from a (big) series. They elaborate the best practices for XML-Schema usage.