Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: processing massive XML files with XML::Twig

by mirod (Canon)
on Dec 05, 2008 at 10:20 UTC ( [id://728244]=note: print w/replies, xml ) Need Help??


in reply to Re: processing massive XML files with XML::Twig
in thread processing massive XML files with XML::Twig

Did you try? I mean did you compare the performances of XML::Twig and XML::SAX? Because I did, for a simple benchmark. Look at the last table.

SAX is convenient because with modules like SAX::Machines it allows you to create pipelines of SAX filters, plug-in dumps... It is IMHO a pain to use. It is also demonstrably slow. At least in Perl.

Sorry, you hit one of my pet peeves ;--)

If you want better performance than XML::Twig, you can use XML::LibXML. The API is different (pure-DOM + XPath + fewer convenience methods than XML::Twig), and it is more difficult to process big files (but XML::LibXML uses less memory than XML::Twig, so you are more likely to be able to load the entire XML in memory).

Replies are listed 'Best First'.
see XML::LibXML::Reader
by myuserid7 (Scribe) on Dec 06, 2008 at 13:56 UTC
    actually, XML::LibXML now has a pull parser (XML::LibXML::Reader) that doesn't read the entire dom into memory. much faster than XML::Twig. i've used it successfully.
      much faster than XML::Twig
      code sample please

      Interesting. I have to see if I can see this as an alternate parser for XML::Twig. Or to create a different module alltogether, that combnes the speed of libxml2 with the convenience (IMHO ;--) of XML::Twig.

      It would be great if you (or someone else!) could provide code examples for the Ways to Rome" series.

        Your benchmarking methodology is much less accurate than it could be. You shouldn't be measuring the time it takes to fork a new process and load the modules, and should be measuring multiple runs and averaging the results- ex. timethese(-5, ...).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://728244]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found