Binford has asked for the wisdom of the Perl Monks concerning the following question:

These handlers and roots in XML::Twig just keep getting me confused... Every time I re-read the CPAN page, I think I have it understood I then write code that either doesn't work, or pukes thousands of lines of the XML all concatenated at me. Here's a subsection of the XML that includes all relevant tags:

<?xml version="1.0" encoding="UTF-8"?> <authenticationReports> <generatedTime>Tue Sep 29 07:07:34 PDT 2009</generatedTime> <appDeploymentFile name="app-deployment.properties.hklcp.trading"> <application name="hk"> <urlInfo> <url>e/t/hk/accts_subscription</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_forms</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/custtradingpage</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_userinfo</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/headermain</url> </urlInfo> <urlInfo> <url>e/t/hk/custservicepage</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_transfermoney</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/userprereq</url> </urlInfo> <urlInfo> <url>e/t/hk/indices_us</url> </urlInfo> <urlInfo> <url>e/t/hk/homeloggedmessage</url> </urlInfo> <urlInfo> <url>e/t/hk/lead</url> </urlInfo> <urlInfo> <url>e/t/hk/orderviewmin</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_changelogin</url> <otherPrereq>SessionPreReq</otherPrereq> </urlInfo> </application> <application name="intl"> <urlInfo> <url>e/t/intl/quotesandresearch</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablesubnavviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablemetaviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/disclaimer</url> </urlInfo> <urlInfo> <url>e/t/intl/headermain</url> </urlInfo> <urlInfo> <url>e/t/intl/indices_us</url> </urlInfo> <urlInfo> <url>e/t/intl/lead</url> </urlInfo> <urlInfo> <url>e/t/intl/selectlanguage</url> </urlInfo> <urlInfo> <url>e/t/intl/get-screen</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/intl/page_f</url> </urlInfo> <urlInfo> <url>e/t/intl/basicprereq</url> </urlInfo> <urlInfo> <url>e/t/intl/page</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> </application> </appDeploymentFile> </authenticationReports>
At first I was told to just get the <appDeploymentFile> and with some munging use it with <url> to produce a HASH value. Someone got me started in an earlier Seekers post with the following code:
#!/usr/bin/perl use strict; use warnings; use XML::Twig; use Data::Dumper; my $AFXML='xmlexample.xml'; #the hashes of the appropriate data files our %AFURLS; our %SMG = (); our %REALMS; our %SMCUST; # Read the XML from the maven plugin for AF that delineates the URL's sub AFXMLtoEM { print "Slurping $AFXML...."; my $TWIG = new XML::Twig ( twig_handlers => {'appDeploymentFile' = +> \&parseURL} ); #my $TWIG = new XML::Twig ( twig_handlers => {'appDeploymentFile/a +pplication' => \&parseURL} ); $TWIG -> parsefile ($AFXML) or die "Can't open $AFXML\n" ; #$TWIG->flush; # Now we want to change every value from the XML name to an EM ins +tance identifier #print Dumper(\%AFURLS); exit 1; while ((my $K, my $ITEM) = each %AFURLS) { my ($G1,$G2,$APP,$INST) = split /\./,$ITEM,4; unless ($APP eq "") { $ITEM = "prd:" . $APP . ":web:" . $INST; } #Cheesy kludge - fiox when Durai confirms $AFURLS{$K} = $ITEM; } print scalar keys %AFURLS, " records slurped in.\n"; } sub parseURL { my ($T, $ADEP) = @_; #print Dumper($T) ."," . Dumper($ADEP) ."\n"; my $NAME= $ADEP->att('name'); print $ADEP->first_child('application')->text() . "\n"; #print "$NAME\n"; for my $URLI ($ADEP->first_child('application')->children('urlInfo +')) { # for my $URLI ($ADEP->children('application')) { # $NAME2 = $ADEP->att('name'); #print "$NAME2\n"; # leading slash added for matching SM filters $AFURLS{ "/" . $URLI->first_child('url')->text() } = $NAME; # $AFURLS{ "/" . $URLI->first_child('urlInfo')->children('url') +->text() } = $NAME . "-" . $NAME2; } } # # Main program START # AFXMLtoEM(); print Dumper(\%AFURLS);
but that just gets me a hash with <url> as the key and my munged data in the value. It also doesn't work for the second (or any additional beyond the first) set of <application> tags. I need ALL of the data in a structure that's use-able for post processing. Even if I need to output it all to a CSV file and then reparse every entry at this point (45 hours and counting on this) line by line. I've used XML::Simple before, but this is more complex XML thna I ever parsed before for one, and it turns the <application> tag NAME into the attribute name, so no way to programmatic-ally get it. :( I am OK with adding the application value to the appDeploymentFile value and the url withe the OtherPrereq value and then handling appropriately in the hash later with a split or somesuch when I am ready to do things with it. Thanks in advance!

Replies are listed 'Best First'.
Re: XML::Twig n00b
by Anonymous Monk on Oct 23, 2009 at 03:42 UTC
    #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $xml = <<'__XML__'; <?xml version="1.0" encoding="UTF-8"?> <authenticationReports> <generatedTime>Tue Sep 29 07:07:34 PDT 2009</generatedTime> <appDeploymentFile name="app-deployment.properties.hklcp.trading"> <application name="hk"> <urlInfo> <url>e/t/hk/accts_subscription</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_forms</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/custtradingpage</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_userinfo</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/headermain</url> </urlInfo> <urlInfo> <url>e/t/hk/custservicepage</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_transfermoney</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/userprereq</url> </urlInfo> <urlInfo> <url>e/t/hk/indices_us</url> </urlInfo> <urlInfo> <url>e/t/hk/homeloggedmessage</url> </urlInfo> <urlInfo> <url>e/t/hk/lead</url> </urlInfo> <urlInfo> <url>e/t/hk/orderviewmin</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_changelogin</url> <otherPrereq>SessionPreReq</otherPrereq> </urlInfo> </application> <application name="intl"> <urlInfo> <url>e/t/intl/quotesandresearch</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablesubnavviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablemetaviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/disclaimer</url> </urlInfo> <urlInfo> <url>e/t/intl/headermain</url> </urlInfo> <urlInfo> <url>e/t/intl/indices_us</url> </urlInfo> <urlInfo> <url>e/t/intl/lead</url> </urlInfo> <urlInfo> <url>e/t/intl/selectlanguage</url> </urlInfo> <urlInfo> <url>e/t/intl/get-screen</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/intl/page_f</url> </urlInfo> <urlInfo> <url>e/t/intl/basicprereq</url> </urlInfo> <urlInfo> <url>e/t/intl/page</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> </application> </appDeploymentFile> </authenticationReports> __XML__ { my $name = ""; my %apps; my %urls; my $t = new XML::Twig( start_tag_handlers => { 'appDeploymentFile/application' => sub { my ( $twig, $tag, %att ) = @_; #~ %att is only defined when twig_roots #~ $_ is only defined when twig_handlers #~ twig_handlers is uses less memory $name = $att{name} || $_->{'att'}->{name}; return; }, }, twig_handlers => { #~ twig_roots => { 'appDeploymentFile/application/urlInfo/url' => sub { #~ warn "root _ = $_ |||| @_ "; warn $_->text;; $apps{$name}{ "/" . $_->text }++; $urls{ "/" . $_->text } = $name; return; }, }, ); $t->parse($xml); undef $t; use Data::Dumper(); print Data::Dumper->new([ \%apps, \%urls ])->Indent(1)->Dump; } __END__ $VAR1 = { 'intl' => { '/e/t/intl/lead' => 1, '/e/t/intl/page_f' => 1, '/e/t/intl/get-screen' => 1, '/e/t/intl/basicprereq' => 1, '/e/t/intl/indices_us' => 1, '/e/t/intl/intltablesubnavviewcomponent' => 1, '/e/t/intl/selectlanguage' => 1, '/e/t/intl/page' => 1, '/e/t/intl/quotesandresearch' => 1, '/e/t/intl/intltablemetaviewcomponent' => 1, '/e/t/intl/disclaimer' => 1, '/e/t/intl/headermain' => 1 }, 'hk' => { '/e/t/hk/orderviewmin' => 1, '/e/t/hk/indices_us' => 1, '/e/t/hk/accts_subscription' => 1, '/e/t/hk/userprereq' => 1, '/e/t/hk/custtradingpage' => 1, '/e/t/hk/custservicepage' => 1, '/e/t/hk/accts_changelogin' => 1, '/e/t/hk/accts_transfermoney' => 1, '/e/t/hk/homeloggedmessage' => 1, '/e/t/hk/lead' => 1, '/e/t/hk/accts_forms' => 1, '/e/t/hk/accts_userinfo' => 1, '/e/t/hk/headermain' => 1 } }; $VAR2 = { '/e/t/intl/lead' => 'intl', '/e/t/intl/page_f' => 'intl', '/e/t/hk/userprereq' => 'hk', '/e/t/hk/custservicepage' => 'hk', '/e/t/intl/indices_us' => 'intl', '/e/t/intl/selectlanguage' => 'intl', '/e/t/intl/intltablemetaviewcomponent' => 'intl', '/e/t/intl/quotesandresearch' => 'intl', '/e/t/intl/headermain' => 'intl', '/e/t/intl/disclaimer' => 'intl', '/e/t/hk/accts_forms' => 'hk', '/e/t/hk/accts_userinfo' => 'hk', '/e/t/hk/accts_subscription' => 'hk', '/e/t/hk/indices_us' => 'hk', '/e/t/hk/orderviewmin' => 'hk', '/e/t/intl/get-screen' => 'intl', '/e/t/hk/custtradingpage' => 'hk', '/e/t/intl/basicprereq' => 'intl', '/e/t/hk/accts_changelogin' => 'hk', '/e/t/hk/accts_transfermoney' => 'hk', '/e/t/hk/homeloggedmessage' => 'hk', '/e/t/intl/intltablesubnavviewcomponent' => 'intl', '/e/t/intl/page' => 'intl', '/e/t/hk/lead' => 'hk', '/e/t/hk/headermain' => 'hk' };
      I've got to study that more, but can I put the otherPrereq tag in the hash? THANKS though...
        Sure, but you'll have to decide on a data structure
        #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $xml = <<'__XML__'; <?xml version="1.0" encoding="UTF-8"?> <authenticationReports> <generatedTime>Tue Sep 29 07:07:34 PDT 2009</generatedTime> <appDeploymentFile name="app-deployment.properties.hklcp.trading"> <application name="hk"> <urlInfo> <url>e/t/hk/accts_subscription</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_forms</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/custtradingpage</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_userinfo</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/headermain</url> </urlInfo> <urlInfo> <url>e/t/hk/custservicepage</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_transfermoney</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/userprereq</url> </urlInfo> <urlInfo> <url>e/t/hk/indices_us</url> </urlInfo> <urlInfo> <url>e/t/hk/homeloggedmessage</url> </urlInfo> <urlInfo> <url>e/t/hk/lead</url> </urlInfo> <urlInfo> <url>e/t/hk/orderviewmin</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_changelogin</url> <otherPrereq>SessionPreReq</otherPrereq> </urlInfo> </application> <application name="intl"> <urlInfo> <url>e/t/intl/quotesandresearch</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablesubnavviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablemetaviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/disclaimer</url> </urlInfo> <urlInfo> <url>e/t/intl/headermain</url> </urlInfo> <urlInfo> <url>e/t/intl/indices_us</url> </urlInfo> <urlInfo> <url>e/t/intl/lead</url> </urlInfo> <urlInfo> <url>e/t/intl/selectlanguage</url> </urlInfo> <urlInfo> <url>e/t/intl/get-screen</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/intl/page_f</url> </urlInfo> <urlInfo> <url>e/t/intl/basicprereq</url> </urlInfo> <urlInfo> <url>e/t/intl/page</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> </application> </appDeploymentFile> </authenticationReports> __XML__ # REUSING $xml # SEE http://search.cpan.org/perldoc?XML::Twig#xparse # $xml = appDeploymentFile('xmlexample.xml'); $xml = appDeploymentFile($xml); use Data::Dumper(); print Data::Dumper->new([ $xml ])->Indent(1)->Dump; sub appDeploymentFile { my( $xml ) = @_; my $url = ""; my %urls; my $appFileName = ""; my $t = new XML::Twig( start_tag_handlers => { 'appDeploymentFile' => sub { my ( $twig, $tag, %att ) = @_; #~ %att is only defined when twig_handlers #~ $_ is only defined when twig_roots #~ twig_handlers is uses less memory $appFileName = $att{name} || $_->{'att'}->{name}; return; }, }, twig_handlers => { 'appDeploymentFile/application/urlInfo/url' => sub { $url = "/" . $_->text ; $urls{ $url } = ""; return; }, 'appDeploymentFile/application/urlInfo/otherPrereq' => sub { $urls{ $url } = $_->text ; return; }, }, ); $t->xparse($xml); undef $t; # REUSING $url $url = join ":", "prd" , $1 , "web" , $2 if $appFileName =~ /\.([^\. +]+?)\.([^\.]+?)$/; return [ $appFileName , $url, \%urls ]; } __END__ $VAR1 = [ 'app-deployment.properties.hklcp.trading', 'prd:hklcp:web:trading', { '/e/t/intl/lead' => '', '/e/t/intl/page_f' => '', '/e/t/hk/userprereq' => '', '/e/t/hk/custservicepage' => '', '/e/t/intl/indices_us' => '', '/e/t/intl/selectlanguage' => '', '/e/t/intl/intltablemetaviewcomponent' => '', '/e/t/intl/quotesandresearch' => '', '/e/t/intl/headermain' => '', '/e/t/intl/disclaimer' => '', '/e/t/hk/accts_forms' => 'HKPwdPreReq', '/e/t/hk/accts_userinfo' => 'HKPwdPreReq', '/e/t/hk/accts_subscription' => 'HKPwdPreReq', '/e/t/hk/indices_us' => '', '/e/t/hk/orderviewmin' => '', '/e/t/intl/get-screen' => 'BasicPrereq', '/e/t/hk/custtradingpage' => 'BasicPrereq', '/e/t/intl/basicprereq' => '', '/e/t/hk/accts_changelogin' => 'SessionPreReq', '/e/t/hk/accts_transfermoney' => 'HKPwdPreReq', '/e/t/hk/homeloggedmessage' => '', '/e/t/intl/intltablesubnavviewcomponent' => '', '/e/t/intl/page' => 'BasicPrereq', '/e/t/hk/lead' => '', '/e/t/hk/headermain' => '' } ];