Re: processing massive XML files with XML::Twig

XML::Twig is a great way to go though you may find better performance if you refactor your approach to use SAX via CPAN XML::SAX module.

update: hmmmm, after running some benchmarks myself ...I stand corrected ... I seem to have been passing on this duft knowledge for too long; thank you to mirod for opening my eyes ... XML::Twig is indeed faster in a lot of situations and I think you are going down the right route, past perhaps considering another tool outside of perl.

Comment on Re: processing massive XML files with XML::Twig

Replies are listed 'Best First'.
Re^2: processing massive XML files with XML::Twig by mirod (Canon) on Dec 05, 2008 at 10:20 UTC
Did you try? I mean did you compare the performances of XML::Twig and XML::SAX? Because I did, for a simple benchmark. Look at the last table. SAX is convenient because with modules like SAX::Machines it allows you to create pipelines of SAX filters, plug-in dumps... It is IMHO a pain to use. It is also demonstrably slow. At least in Perl. Sorry, you hit one of my pet peeves ;--) If you want better performance than XML::Twig, you can use XML::LibXML. The API is different (pure-DOM + XPath + fewer convenience methods than XML::Twig), and it is more difficult to process big files (but XML::LibXML uses less memory than XML::Twig, so you are more likely to be able to load the entire XML in memory).	[reply]
see XML::LibXML::Reader by myuserid7 (Scribe) on Dec 06, 2008 at 13:56 UTC
actually, XML::LibXML now has a pull parser (XML::LibXML::Reader) that doesn't read the entire dom into memory. much faster than XML::Twig. i've used it successfully.	[reply]
Re: see XML::LibXML::Reader by Anonymous Monk on Dec 06, 2008 at 14:15 UTC
much faster than XML::Twig code sample please	[reply]
Re^2: see XML::LibXML::Reader by myuserid7 (Scribe) on Dec 06, 2008 at 17:51 UTC
Re^3: see XML::LibXML::Reader by Anonymous Monk on Dec 07, 2008 at 02:51 UTC
Re: see XML::LibXML::Reader by mirod (Canon) on Dec 07, 2008 at 10:50 UTC
Interesting. I have to see if I can see this as an alternate parser for XML::Twig. Or to create a different module alltogether, that combnes the speed of libxml2 with the convenience (IMHO ;--) of XML::Twig. It would be great if you (or someone else!) could provide code examples for the Ways to Rome" series.	[reply]
Re^2: see XML::LibXML::Reader by Anonymous Monk on Dec 07, 2008 at 21:52 UTC
Re^3: see XML::LibXML::Reader by mirod (Canon) on Dec 08, 2008 at 10:01 UTC