in reply to Which XML parser would be the wisest to use

Can your data fit in an XML::LibXML DOM (ie in XML::LibXML is tree mode)? If it does then go for it. That will be the fastest you can get in Perl.

If not XML::Parser is probably the fastest you can get: XML::Twig and XML::Rules are based on it, and are usually slower. Surprisingly, I found that SAX, whether based on XML::Parser or on XML::LibXML, is very slow (see at the end of Simple Perl XML Benchmark).

  • Comment on Re: Which XML parser would be the wisest to use

Replies are listed 'Best First'.
Re^2: Which XML parser would be the wisest to use
by wardy3 (Scribe) on Feb 21, 2008 at 06:05 UTC
    Thanks, Mirod!

    That's a great table. I noticed XSLT did will in the extract text from elements.

    I had thought about XSLT but assumed it'd be too slow. I tried to learn it a few years ago and maybe it's time to re-visit.

      The thing is, XML::LibXSLT will load the entire document in memory. And as it is based on libxml2, just like XML::LibXML, it probably needs about the same amount of space as XML::LibXML.

      An alternate solution that I forgot to mention, mostly because I have never tried it and I don't know even if XML::LibXML supports it: libxml2 has a pull mode, that you might be able to use to lower the memory requirements of your code (by deleting things you don't use any more in your DOM). If you go that route, it'd be interesting if you could describe how it works, because that could be a good alternative to SAX when processing huge documents.

        Hi again, mirod

        Thanks for your help

        I've since played a bit with XML::LibXML and it is very fast! I've got my parsing down to 13 seconds.

        I seem to have no issues with memory (unlink XML::XPath) so it seems I was put off the parser unnecessarily.

        I've read a bit about pull but haven't got brave enough yet :-). I might if I get time but 13 secs is good enough at the moment. Same for XSLT - delayed learning it again ...

        I'm not sure how to best use XML::LibXML but I'll post a new thread. I didn't seem very well documented for new-comers, unless I'm missing something.

        Thanks again