in reply to Recursive XML navigation XML::Twig help!

Just to add,
I can recursively extract all the elements, but the problem <node id="0"> comes last, ie Twig is performing the entire parsing in memory (<geography> node is 35MB large) and runs out of memory...

What I want, is to parse <node id="0"> and purge before moving onto the next node

what I have at the moment, which is running out of memory is

#! /usr/bin/perl use strict; use warnings; use XML::Twig; my $file = $ARGV[0]; my $twig= new XML::Twig( TwigRoots=>{geography=>1}, TwigHandlers=> { 'node' =>\&geog_node } ); $twig->parsefile($file); sub geog_node { my($twig, $section) = @_; if($section->att('hidden') eq '0') { print"Data for node: \n"; print"\tID:\t",$section->att('id'),"\n"; print "\tDesc:\t",$section->first_child('description')->text," +\n"; } }

Replies are listed 'Best First'.
Re^2: Recursive XML navigation XML::Twig help!
by mirod (Canon) on Oct 02, 2006 at 08:21 UTC

    With XML::Twig, usually you free the memory by purge-ing the twig after you're done with the data. Of course your problem here is that after you're done with the data in the inner node you can't use purge, as it would remove data about the outter node, that you still need to use.

    I see 2 alternatives: if you just delete a node after you're done displaying it, then you're good: by definition all of its inner nodes have been processed (and indeed deleted), so you can delete it, memory is freed, everybody's happy. Just add $section->delete at the end of geog_node.

    The other alternative, is a little more complex, but has the advantage of outputting the nodes in the order in which they appear in the document (starting wit the first outter node): when you get to the first node to be processed, which will be an inner node, you already have all the information available to output all of its ancestors (their id and description have already been parsed)... so just do it. Go back to the ancestors and output them one by one. Mark them as output so you don't do it again for the next inner node... et voilà! At that point you can safely purge the twig, you've used all of its relevant info.

    #! /usr/bin/perl use strict; use warnings; use XML::Twig; my $file = $ARGV[0]; my $twig= new XML::Twig( TwigRoots=>{geography=>1}, TwigHandlers=> { 'node' =>\&geog_node } ); $twig->parsefile($file); sub geog_node { my($twig, $node) = @_; # ancestors_or_self returns the node first, the ancestors from inn +er to outter # hence the reverse foreach my $node_to_output ( reverse $node->ancestors_or_self( 'no +de[!@output]')) { if($node_to_output->att('hidden') eq '0') { print node_summary( $node_to_output); } $node_to_output->set_att( output => 1); # mark so we don't try + to output them again } $twig->purge; # now it's safe to pur +ge } sub node_summary { my $node= shift; return sprintf "Data for node:\tID:\t%s\n\tDesc:\t%s\n", $node->at +t('id'), $node->fi +eld('description') ; }