in reply to Using the XML::Parser Module

I’d advise against XML::Parser at this point for two reasons – it’s a wrapper around the rather old (if trusty) expat library, and its API is rather hard to program for – because back when expat was written, XML was still in a bit of a flux.

For processing XML documents, you want to learn about XPath. A pithy description of what it is might be “a pattern match language for trees.” I lets you specify which portion of a document you’re interested in very concisely. Knowing XPath is the difference between XML being a chore or a charm.

XML::Twig does make things much easier, but when I last dealt with it it did not offer real XPath support and worked pretty heavily on the Perl side of things. That means large documents are slow to process and can consumed a lot of memory. The memory hunger can be controlled if you pay careful attention and your use case lends itself to processing the document chunk-wise, but that takes effort.

I’d instead suggest XML::LibXML. It’s a wrapper around the newer, more compliant libxml2 library which offers the nicer sorts of APIs that were designed after XML was finished – its XPath support is excellent. And since its internal data structures all reside on the C side, it can handle much larger documents than the (more) pure-Perl modules without any effort on the programmer’s part. It’s also much faster than such modules for the same reason.

I use it for all of my XML needs these days am an absolutely satisfied customer.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^2: Using the XML::Parser Module
by mirod (Canon) on Nov 02, 2005 at 14:25 UTC

    I would agree with you that XML::LibXML is also a good choice. In my (oddly enough limited ;--) experience, it feels a little "lower-level" than XML::Twig, mostly because it forces you to use the DOM to process the data, while XML::Twig has (lots of!) higher-level methods. I agree that it implements very well quite a few standards, and it probably lends to more rigourous code than XML::Twig.

    One word to correct you on one point: XML::Twig did not offer real XPath support: it does now, if you use XML::Twig::XPath, which simply re-uses XML::XPath engine.

      I always think of you and cringe a little when I recommend against your module. You did a really admirable job of building something actually usable onto XML::Parser, and for a long time XML::Twig was my indeed favourite. It’s just that expat and XML::Parser really needed that work to be turned into something sane, whereas XML::LibXML is sane to begin with. Sorry. :-) :-(

      Re: XPath support: thanks for the pointer; noted.

      Makeshifts last the longest.

        Agreed, I am not sure expat is that bad, but indeed XML::Parser is a royal pain to deal with, especially when trying to be compatible with various versions.

        If I had started writing XML::Twig a little later, I would probably would have written it using SAX, and I would have been able to change the parser. As it is, SAX was very new at the time, all other modules were built on top of XML::Parser, so I went with that, and then it got so entangled with XML::Parser's quirks that decoupling them would be real hard now.

        Oh, and don't feel bad advocating XML::LibXML, it is indeed a very fine module.