in reply to Prune Twig From Huge XML File
I check, and when you use ignore_elts, the data in the ignored element is never loaded. So there is no reason why the code shouldn't be fast.
Indeed the following code worked takes 0.2s on my (rather slow) machine to prune a 200 MB document containing 20 content elements, each containing a 10 MB CDATA section:
#!/usr/bin/perl use strict; use warnings; use XML::Twig; XML::Twig->new( ignore_elts => { content => 1 }, twig_handlers => { _d +efault_ => sub { $_->flush } }, keep_spaces => 1, ) ->parsefile( 'doc_with_big_content.xml');
Now I have to see if I can improve the "snipping" part. Maybe by giving the option to not buffer the entire text for each element. How big is your file BTW?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Prune Twig From Huge XML File
by andergoo (Initiate) on Mar 18, 2009 at 06:13 UTC |