basalto has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've done a script to parse a huge XML file and right now i'm trying to speed it up adding some filter options.

I've set a start_tag_handler with the objective of ignoring all tree in case of current tag exists in my option hash. First time that start_tag_handler sub is raised everything works fine, but at second time start_tag_handler is called without the attribute hash???

Maybe I'm doing something wrong or i don't understand it right. Bellow I've copy paste a sample of what i'm trying to do. As you can see in got an error 2nd time i try to print attribute hash of start_tag_handler.

there's any monk that can help me?
use strict; use warnings; use XML::Twig; my $xml = <<XML; <doc> <fileHeader name='myTest'/> <SubNetwork id='CGRA01'> <MeContext id='Lisboa_1'> <data> <type>aaa</type> <var>a</var> </data> </MeContext> <MeContext id='Moscavide_2'> <data> <type>bbb</type> <var>b</var> </data> </MeContext> </SubNetwork> <SubNetwork id='CLOU01'> <MeContext id='Loures_3'> <data> <type>ccc</type> <var>c</var> </data> </MeContext> <MeContext id='Odivelas_3'> <data> <type>ddd</type> <var>d</var> </data> </MeContext> </SubNetwork> </doc> XML my $SUBNETWORK_EXCLUDE_LIST = { #'CLOU01' => 1 'CGRA01' => 1 }; my $start_tag_handlers = { 'SubNetwork' => \&SubNetwork, }; my $twig_roots = { 'fileHeader' => \&fileHeader, 'MeContext' => 1 }; local our $twig_handlers = { 'MeContext' => \&MeContext }; my $twig= new XML::Twig( start_tag_handlers => $start_tag_handlers, twig_roots => $twig_roots, twig_handlers => $twig_handlers ); print "\n### Parsing XML file ###\n"; my $root = $twig->parse( $xml ); sub SubNetwork { my ($twig, $tag, %att) = @_; print "\nSubNetwork: ".$att{id}; if (exists $SUBNETWORK_EXCLUDE_LIST->{$att{id}}) { print " => Excluded\n"; my $handler = sub { print $_[1]->att('id')." => Excluded\n"; $_[1] +->ignore() }; $twig->setStartTagHandler ('MeContext', $handler); } else { my $handler = { 'SubNetwork' => \&SubNetwork }; $twig->setStartTagHandlers ($handler); } return 0; } sub fileHeader { my ( $twig, $fileHeader) = @_; print "Name: ".$fileHeader->att('name')."\n"; return 0; } sub MeContext { my ( $twig, $MeContext) = @_; print "MeContext: ".$MeContext->att('id')."\n"; print "\tType: ".$MeContext->first_child('data')->first_child('type' +)->text."\n"; print "\tVar: ".$MeContext->first_child('data')->first_child('var')- +>text."\n"; $MeContext->purge; return 1; }

Thanks in advance,

Ricardo Dinis

Replies are listed 'Best First'.
Re: Strange start_tag_handlers behaviour using twig module
by BaldManTom (Friar) on Apr 10, 2008 at 16:24 UTC

    Hi basalto,

    Not sure if this will help you or not, but I think it might have to do with the use of $twig->setStartTagHandlers. I used Data::Dumper to look at what your SubNetwork routine was receiving as arguments. The first time, it received a XML::Twig for $twig, a string scalar for $tag, and a hash for %att. Then the sub goes about and changes the start tag handlers using setStartTagHandler or setStartTagHandlers, depending on which branch of the if statement is followed. The second time SubNetwork is called, it receives only a XML::Twig and a XML::Twig::Elt. The question you're asking is "why" and unfortunately I have no idea, but perhaps my post will help spark something for you.

    Regards,

    Bald Man Tom

    Update: Sorry, I should have mentioned before that when I was playing around with your script, I commmented out the if statement in the SubNetwork sub so that the start tag handlers wouldn't be redefined, and the second time SubNetwork was called, it received the "expected" information.

      Hi Bald Man Tom, Is exactly that strange behaviour that I can't explain. If I didn't change start_tag_handler during the processing time everything goes fine, but if i add some new element in the hash, it seems start_tag_handler is raised as a twig_handler with only two arguments (XML::Twig and XML::Twig::Elt).

      Regards,

      Basalto