Hi,

I've done a script to parse a huge XML file and right now i'm trying to speed it up adding some filter options.

I've set a start_tag_handler with the objective of ignoring all tree in case of current tag exists in my option hash. First time that start_tag_handler sub is raised everything works fine, but at second time start_tag_handler is called without the attribute hash???

Maybe I'm doing something wrong or i don't understand it right. Bellow I've copy paste a sample of what i'm trying to do. As you can see in got an error 2nd time i try to print attribute hash of start_tag_handler.

there's any monk that can help me?
use strict; use warnings; use XML::Twig; my $xml = <<XML; <doc> <fileHeader name='myTest'/> <SubNetwork id='CGRA01'> <MeContext id='Lisboa_1'> <data> <type>aaa</type> <var>a</var> </data> </MeContext> <MeContext id='Moscavide_2'> <data> <type>bbb</type> <var>b</var> </data> </MeContext> </SubNetwork> <SubNetwork id='CLOU01'> <MeContext id='Loures_3'> <data> <type>ccc</type> <var>c</var> </data> </MeContext> <MeContext id='Odivelas_3'> <data> <type>ddd</type> <var>d</var> </data> </MeContext> </SubNetwork> </doc> XML my $SUBNETWORK_EXCLUDE_LIST = { #'CLOU01' => 1 'CGRA01' => 1 }; my $start_tag_handlers = { 'SubNetwork' => \&SubNetwork, }; my $twig_roots = { 'fileHeader' => \&fileHeader, 'MeContext' => 1 }; local our $twig_handlers = { 'MeContext' => \&MeContext }; my $twig= new XML::Twig( start_tag_handlers => $start_tag_handlers, twig_roots => $twig_roots, twig_handlers => $twig_handlers ); print "\n### Parsing XML file ###\n"; my $root = $twig->parse( $xml ); sub SubNetwork { my ($twig, $tag, %att) = @_; print "\nSubNetwork: ".$att{id}; if (exists $SUBNETWORK_EXCLUDE_LIST->{$att{id}}) { print " => Excluded\n"; my $handler = sub { print $_[1]->att('id')." => Excluded\n"; $_[1] +->ignore() }; $twig->setStartTagHandler ('MeContext', $handler); } else { my $handler = { 'SubNetwork' => \&SubNetwork }; $twig->setStartTagHandlers ($handler); } return 0; } sub fileHeader { my ( $twig, $fileHeader) = @_; print "Name: ".$fileHeader->att('name')."\n"; return 0; } sub MeContext { my ( $twig, $MeContext) = @_; print "MeContext: ".$MeContext->att('id')."\n"; print "\tType: ".$MeContext->first_child('data')->first_child('type' +)->text."\n"; print "\tVar: ".$MeContext->first_child('data')->first_child('var')- +>text."\n"; $MeContext->purge; return 1; }

Thanks in advance,

Ricardo Dinis

In reply to Strange start_tag_handlers behaviour using twig module by basalto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.