Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello
I'm using the following snippet to parse a 12 meg XML file on a windows Box
my $twig= new XML::Twig; # Parse the XML file $twig->parsefile( "$rrd\.xml");
On a beefy windows box, this is taking literally hours, but only 10 minutes on a Unix box.

I know their is a substantial difference between the two processors, (64 bit vs 32 bit) but are there any performance mods I can make to the code to make it any faster for windows?

Replies are listed 'Best First'.
Re: XML Parse performance
by mirod (Canon) on Mar 19, 2003 at 17:06 UTC

    Well, the code is about as simple as can be... so it might be hard to simplify it.

    The first thing to try is to see if it is really XML::Twig causing the difference or if it is XML::Parser: try perl -MXML::Parser -e'XML::Parser->new->parsefile("rrd.xml")' and see if parsing time is similar on the 2 machines or not. then you can try adding dummy handlers to the parse, to see what's going on.

    Is there a big difference in the amount of memory on the 2 computers? A 12Mb file should translate into something like 150Mb in memory, depending on the amount of tagging in the XML.

    Then, do you really need to process all of the data? if not using twig_roots would reduce the memory needed and possibly speedup the code (if you skip big chunks of the data).

    If you find something that explains the difference I would be very interested in knowing about it, thanks.

Re: XML Parse performance
by Elian (Parson) on Mar 19, 2003 at 18:47 UTC
    Check how much memory is on the two boxes, and monitor the swapping status. It's distinctly possible that the windows box has insufficient memory to handle the parsing, and thus swaps, while the Unix box may well have more memory on it and avoids swapping. That'd definitely kill your performance.

    You also might want to check and see if any optional modules are missing on the windows side of things. Many modules will use a pure perl version of something if the XS version isn't available, so an operation still works, albeit at a much slower rate.

      You also might want to check and see if any optional modules are missing on the windows side of things. Many modules will use a pure perl version of something if the XS version isn't available, so an operation still works, albeit at a much slower rate.

      Not likely in this case. The only modules that could impact performances would be Scalar::Util which would give weaken, but would improves performances only when deleting parts of the tree, and Encode or Text::Iconv if an output_encoding was specified, which is not the case (and would be used only when printing the twig anyway).

      Insufficient memory is a definite possibility though.

Re: XML Parse performance
by grantm (Parson) on Mar 19, 2003 at 21:45 UTC

    As mirod and Elian have observed, limited memory leading to swapping is the most likely cause of the problem. If you slurp the whole document into a document tree then you'll likely have exactly the same problems regardless of which module you use for parsing. However, the main point of XML::Twig is to enable you to reap the benefits of a tree based model without having to have the whole file in memory. With judicious use of the twig_roots mode, you can have XML::Twig hand your code a tree representation of one section of the document. When you've finished working with that section, discard it and the memory will be reused for the next section.

Re: XML Parse performance
by Anonymous Monk on Mar 21, 2003 at 05:19 UTC
    It also matters if the Windows box has to dl the DTD for the file it is parsing vs. having a local copy. Not really a *Perl* issue, but worth looking into. I noticed my own XSL processing (on Linux) got a LOT faster after downloading and installing the DTD locally.