in reply to Re: processing massive XML files with XML::Twig
in thread processing massive XML files with XML::Twig

Did you try? I mean did you compare the performances of XML::Twig and XML::SAX? Because I did, for a simple benchmark. Look at the last table.

SAX is convenient because with modules like SAX::Machines it allows you to create pipelines of SAX filters, plug-in dumps... It is IMHO a pain to use. It is also demonstrably slow. At least in Perl.

Sorry, you hit one of my pet peeves ;--)

If you want better performance than XML::Twig, you can use XML::LibXML. The API is different (pure-DOM + XPath + fewer convenience methods than XML::Twig), and it is more difficult to process big files (but XML::LibXML uses less memory than XML::Twig, so you are more likely to be able to load the entire XML in memory).

  • Comment on Re^2: processing massive XML files with XML::Twig

Replies are listed 'Best First'.
see XML::LibXML::Reader
by myuserid7 (Scribe) on Dec 06, 2008 at 13:56 UTC
    actually, XML::LibXML now has a pull parser (XML::LibXML::Reader) that doesn't read the entire dom into memory. much faster than XML::Twig. i've used it successfully.
      much faster than XML::Twig
      code sample please

      Interesting. I have to see if I can see this as an alternate parser for XML::Twig. Or to create a different module alltogether, that combnes the speed of libxml2 with the convenience (IMHO ;--) of XML::Twig.

      It would be great if you (or someone else!) could provide code examples for the Ways to Rome" series.

        Your benchmarking methodology is much less accurate than it could be. You shouldn't be measuring the time it takes to fork a new process and load the modules, and should be measuring multiple runs and averaging the results- ex. timethese(-5, ...).