in reply to Prune Twig From Huge XML File

Assuming you already have the trimmed contents somewhere and you don't mind that the order is not kept, this might work. The content of the <content> tag will be skipped by expat so it should not waste memory:

use strict; use XML::Rules; my @contents = ('first trimmed content', 'second trimmed content'); my $parser = XML::Rules->new( style => 'filter', start_rules => { content => 'skip', }, rules => { _default => 'raw', product => sub { my ($tag, $attr, $parser) = @_[0,1,4]; $attr->{content} = [ $contents[ $parser->{pad}++ ] ]; return $tag => $attr; } }, ); $parser->filter(\*DATA); __DATA__ <document> <product> <date>2008-10-15</date> <price>124</price> <content>heinous amount of unwanted text</content> <color>red</color> </product> <product> <date>2009/01/30</date> <price>10</price> <content>heinous amount of unwanted text</content> <color>black</color> </product> </document>

Or better formatted, but with even less defined order of child tags of <product> and ¡assuming all those child tags have no attributes or children! :

use strict; use XML::Rules; my @contents = ('first trimmed content', 'second trimmed content'); my $parser = XML::Rules->new( style => 'filter', ident => ' ', stripspaces => 3, start_rules => { content => 'skip', }, rules => { _default => 'content array', product => sub { my ($tag, $attr, $parser) = @_[0,1,4]; $attr->{content} = [ $contents[ $parser->{pad}++ ] ]; return $tag => $attr; } }, ); $parser->filter(\*DATA); __DATA__ <document> <product> <date>2008-10-15</date> <price>124</price> <content>heinous amount of unwanted text</content> <color>red</color> </product> <product> <date>2009/01/30</date> <price>10</price> <content>heinous amount of unwanted text</content> <color>black</color> </product> </document>

If the order is important you'd have to take the first code and tweak the handler of the <product> tag to insert the content at the right place of the array @{$attr->{_content}}.

Replies are listed 'Best First'.
Re^2: Prune Twig From Huge XML File
by Anonymous Monk on Mar 17, 2009 at 09:09 UTC
    This is great stuff, thanks!
    I haven't used XML::Rules 'til now, looks like it's worth having a look at.