Re: Prune Twig From Huge XML File

I check, and when you use ignore_elts, the data in the ignored element is never loaded. So there is no reason why the code shouldn't be fast.

Indeed the following code worked takes 0.2s on my (rather slow) machine to prune a 200 MB document containing 20 content elements, each containing a 10 MB CDATA section:

#!/usr/bin/perl

use strict;
use warnings;
 
use XML::Twig;

XML::Twig->new( ignore_elts => { content => 1 }, twig_handlers => { _d
+efault_ => sub { $_->flush } }, keep_spaces => 1, )
         ->parsefile( 'doc_with_big_content.xml');
[download]

Now I have to see if I can improve the "snipping" part. Maybe by giving the option to not buffer the entire text for each element. How big is your file BTW?

Comment on Re: Prune Twig From Huge XML File Select or Download Code

Replies are listed 'Best First'.
Re^2: Prune Twig From Huge XML File by andergoo (Initiate) on Mar 18, 2009 at 06:13 UTC
Yes, `ignore_elts` works perfectly and is very fast in removing the content. I was stupidly trying to use `content->delete` in a handler. My file is ~250MB, the biggest content chunk is 75MB.	[reply] [d/l] [select]