in reply to processing massive XML files with XML::Twig

XML::Twig is a great way to go though you may find better performance if you refactor your approach to use SAX via CPAN XML::SAX module.

update: hmmmm, after running some benchmarks myself ...I stand corrected ... I seem to have been passing on this duft knowledge for too long; thank you to mirod for opening my eyes ... XML::Twig is indeed faster in a lot of situations and I think you are going down the right route, past perhaps considering another tool outside of perl.
  • Comment on Re: processing massive XML files with XML::Twig

Replies are listed 'Best First'.
Re^2: processing massive XML files with XML::Twig
by mirod (Canon) on Dec 05, 2008 at 10:20 UTC

    Did you try? I mean did you compare the performances of XML::Twig and XML::SAX? Because I did, for a simple benchmark. Look at the last table.

    SAX is convenient because with modules like SAX::Machines it allows you to create pipelines of SAX filters, plug-in dumps... It is IMHO a pain to use. It is also demonstrably slow. At least in Perl.

    Sorry, you hit one of my pet peeves ;--)

    If you want better performance than XML::Twig, you can use XML::LibXML. The API is different (pure-DOM + XPath + fewer convenience methods than XML::Twig), and it is more difficult to process big files (but XML::LibXML uses less memory than XML::Twig, so you are more likely to be able to load the entire XML in memory).

      actually, XML::LibXML now has a pull parser (XML::LibXML::Reader) that doesn't read the entire dom into memory. much faster than XML::Twig. i've used it successfully.
        much faster than XML::Twig
        code sample please

        Interesting. I have to see if I can see this as an alternate parser for XML::Twig. Or to create a different module alltogether, that combnes the speed of libxml2 with the convenience (IMHO ;--) of XML::Twig.

        It would be great if you (or someone else!) could provide code examples for the Ways to Rome" series.