in reply to Ignore elements using twig module

I'm not sure how much the following will speed up processing, but it does filter the unwanted elements:

use strict; use warnings; use XML::Twig; my $xml = <<XML; <doc> <data> <type>xxx</type> <vars>a</vars> </data> <data> <type>yyy</type> <vars>b</vars> </data> </doc> XML my $root = XML::Twig->new (twig_handlers => {data => \&handler}); $root->parse ($xml); sub handler { my $elt = $_; return if $elt->children (\&badType); print "Handling ", $elt->text (), "\n"; } sub badType { return $_->text () =~ /^yyy/; }

Prints:

Handling xxxa

Perl is environmentally friendly - it saves trees

Replies are listed 'Best First'.
Re^2: Ignore elements using twig module
by basalto (Beadle) on Feb 23, 2008 at 11:38 UTC
    It seems that ignore() method is what i'm seeking to stop and delete current <data> twig.

    I'm going to try it in my script and i'll come back as soon as i've results.

    GrandFather, your sample could be handy but i think in that specific case doesn't help me because i need to stop and purge current twig if type element is matched.

    To become more clear i add a better sample. My xml file has thousands of <container> elements with thousands of text elements to extract and import to one database. My ideia is to "twig" all <container> elements, but to speed up i need to exclude containers that match some kind of types. Just to be more dificult, <container> elements can be nested.

    <container> <attribute> <type>xxx</type> </attribute> <data> <var1>a</var1> <var2>b</var2> </data> </container> <container> <attribute> <type>yyy</type> </attribute> <variables> <var1>a</var1> <var2>b</var2> </variables> </container>
      Hi,

      Sorry the delay, but i don't have too much time to spend coding. This is not my job and I'm doing this just to get some skills about processing XML data.

      Concerning my initial question, I can say that after i apply ignore() method on my program, processing time had a huge reduction as expected. Parsing time drops 33% when input file has 270 MB (initial code takes 9m24s and now takes only 6m16s).

      Thank you for your support.

      Ricardo