http://qs1969.pair.com?node_id=728546


in reply to Re^2: processing massive XML files with XML::Twig
in thread processing massive XML files with XML::Twig

actually, XML::LibXML now has a pull parser (XML::LibXML::Reader) that doesn't read the entire dom into memory. much faster than XML::Twig. i've used it successfully.

Replies are listed 'Best First'.
Re: see XML::LibXML::Reader
by Anonymous Monk on Dec 06, 2008 at 14:15 UTC
    much faster than XML::Twig
    code sample please
        You said much faster than XML::Twig. , so where is benchmarking code sample for that?
Re: see XML::LibXML::Reader
by mirod (Canon) on Dec 07, 2008 at 10:50 UTC

    Interesting. I have to see if I can see this as an alternate parser for XML::Twig. Or to create a different module alltogether, that combnes the speed of libxml2 with the convenience (IMHO ;--) of XML::Twig.

    It would be great if you (or someone else!) could provide code examples for the Ways to Rome" series.

      Your benchmarking methodology is much less accurate than it could be. You shouldn't be measuring the time it takes to fork a new process and load the modules, and should be measuring multiple runs and averaging the results- ex. timethese(-5, ...).

        I happen to think that load time is important, and the time to fork should impact all tests similarly.BTW XML::Twig does probably very badly in this respect, so you can't say I am biased.

        As far as I know, no one has ever challenged the "SAX is lightweight and fast" before I published this benchmark. And no one since then has ever come up with any figure that would prove me wrong when I say "SAX is slow".

        Of course my benchmarks are imperfect. Of course I am sure you could do better. Then do it.