slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I'm using XML::Twig on Windows 7 (Strawberry Perl v5.16.3) to parse some pretty large XML files (e.g. ~800MB) and getting the old "out of memory!" error message. The script works fine on smaller files. Probably a dumb question, but am I just SOL? Is there any way to slurp through the file in manageable chunks? I'm using a remote server that I don't own so adding more memory is not an option.

(also tried XML::Simple)

Here's my code if it helps, but it's not really doing anything complicated or interesting.

use XML::Twig; use strict; my $file = shift; # set up the XML parser: my $twig= XML::Twig->new( comments => 'keep', twig_handlers => { row => \&row_processing }, pretty_print => 'indented', ); print " parsing $file...\n"; $twig->parsefile($file); $twig->purge; sub row_processing { my($twig, $rows)= @_; print "a row\n"; }

thanks -- Scott

Replies are listed 'Best First'.
Re: out of memory! parsing large XML file ( xml_pp )
by Anonymous Monk on Feb 13, 2014 at 04:54 UTC

      looks like $rows->purge did the trick! thanks for the tip, didn't know you could purge on that level.

      Thanks to all for the tips -- I'm basically just reading the contents and attributes of a bunch of row/cell elements. (Yes I realize Twig is overkill, but I know how to use it.) And since you asked, here's a snippet:

      <?xml version="1.0" encoding="UTF-8"?> <xmlreport title="Enterprise Internet" dates="May 1, 2013 - May 31, 20 +13"> <columns> <column name="Page" type="dimension">Page</column> <column name="Visitors" type="metric">Visitors</column> <column name="ABC Visitor %" type="metric">ABC Visitor %</colu +mn> <column name="New Visitors" type="metric">New Visitors</column +> <column name="ABC Visitors" type="metric">ABC Visitors</column +> <column name="Visits per Visitor" type="metric">Visits per Vis +itor</column> </columns> <rows> <row rownum="1"> <cell columnname="page" csv="&quot;publib.boulder.ABC.com/ +infocenter/zvm/v6r2/topic/com.ABC.zvm.v620/zvminfoc03.htm&quot;" db=" +53008">publib.boulder.ABC.com/infocenter/...ic/com.ABC.zvm.v620/zvmin +foc03.htm</cell> <cell columnname="cm_visitors" db="407">407</cell> <cell columnname="cm_ABCvisitor1" csv="&quot;33.7%&quot;" +db="33.700000">33.7%</cell> <cell columnname="newvisitors" db="80">80</cell> <cell columnname="cm_ABCvisitors" db="137">137</cell> <cell columnname="cm_visitspervisitor" db="1.958231">2.0</ +cell> </row> <row rownum="2"> <cell columnname="page" csv="&quot;publib.boulder.ABC.com/ +infocenter/zvm/v6r2/topic/com.ABC.zvm.v620/whatsin.htm&quot;" db="113 +6334">publib.boulder.ABC.com/infocenter/...topic/com.ABC.zvm.v620/wha +tsin.htm</cell> <cell columnname="cm_visitors" db="2">2</cell> <cell columnname="cm_ABCvisitor1" csv="&quot;0.0%&quot;" d +b="0.000000">0.0%</cell> <cell columnname="newvisitors" db="0">0</cell> <cell columnname="cm_ABCvisitors" db="0">0</cell> <cell columnname="cm_visitspervisitor" db="1.000000">1.0</ +cell> </row> </rows> </xmlreport>
Re: out of memory! parsing large XML file
by choroba (Cardinal) on Feb 13, 2014 at 10:40 UTC
    Shouldn't you purge in the handler?

    I would recommend XML::LibXML::Reader, anyway.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: out of memory! parsing large XML file
by pajout (Curate) on Feb 13, 2014 at 10:52 UTC
    Hello Scott,
    what do you plan to do with parsed document?