These handlers and roots in XML::Twig just keep getting me confused... Every time I re-read the CPAN page, I think I have it understood I then write code that either doesn't work, or pukes thousands of lines of the XML all concatenated at me. Here's a subsection of the XML that includes all relevant tags:

<?xml version="1.0" encoding="UTF-8"?> <authenticationReports> <generatedTime>Tue Sep 29 07:07:34 PDT 2009</generatedTime> <appDeploymentFile name="app-deployment.properties.hklcp.trading"> <application name="hk"> <urlInfo> <url>e/t/hk/accts_subscription</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_forms</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/custtradingpage</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/accts_userinfo</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/headermain</url> </urlInfo> <urlInfo> <url>e/t/hk/custservicepage</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_transfermoney</url> <otherPrereq>HKPwdPreReq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/hk/userprereq</url> </urlInfo> <urlInfo> <url>e/t/hk/indices_us</url> </urlInfo> <urlInfo> <url>e/t/hk/homeloggedmessage</url> </urlInfo> <urlInfo> <url>e/t/hk/lead</url> </urlInfo> <urlInfo> <url>e/t/hk/orderviewmin</url> </urlInfo> <urlInfo> <url>e/t/hk/accts_changelogin</url> <otherPrereq>SessionPreReq</otherPrereq> </urlInfo> </application> <application name="intl"> <urlInfo> <url>e/t/intl/quotesandresearch</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablesubnavviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/intltablemetaviewcomponent</url> </urlInfo> <urlInfo> <url>e/t/intl/disclaimer</url> </urlInfo> <urlInfo> <url>e/t/intl/headermain</url> </urlInfo> <urlInfo> <url>e/t/intl/indices_us</url> </urlInfo> <urlInfo> <url>e/t/intl/lead</url> </urlInfo> <urlInfo> <url>e/t/intl/selectlanguage</url> </urlInfo> <urlInfo> <url>e/t/intl/get-screen</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> <urlInfo> <url>e/t/intl/page_f</url> </urlInfo> <urlInfo> <url>e/t/intl/basicprereq</url> </urlInfo> <urlInfo> <url>e/t/intl/page</url> <otherPrereq>BasicPrereq</otherPrereq> </urlInfo> </application> </appDeploymentFile> </authenticationReports>
At first I was told to just get the <appDeploymentFile> and with some munging use it with <url> to produce a HASH value. Someone got me started in an earlier Seekers post with the following code:
#!/usr/bin/perl use strict; use warnings; use XML::Twig; use Data::Dumper; my $AFXML='xmlexample.xml'; #the hashes of the appropriate data files our %AFURLS; our %SMG = (); our %REALMS; our %SMCUST; # Read the XML from the maven plugin for AF that delineates the URL's sub AFXMLtoEM { print "Slurping $AFXML...."; my $TWIG = new XML::Twig ( twig_handlers => {'appDeploymentFile' = +> \&parseURL} ); #my $TWIG = new XML::Twig ( twig_handlers => {'appDeploymentFile/a +pplication' => \&parseURL} ); $TWIG -> parsefile ($AFXML) or die "Can't open $AFXML\n" ; #$TWIG->flush; # Now we want to change every value from the XML name to an EM ins +tance identifier #print Dumper(\%AFURLS); exit 1; while ((my $K, my $ITEM) = each %AFURLS) { my ($G1,$G2,$APP,$INST) = split /\./,$ITEM,4; unless ($APP eq "") { $ITEM = "prd:" . $APP . ":web:" . $INST; } #Cheesy kludge - fiox when Durai confirms $AFURLS{$K} = $ITEM; } print scalar keys %AFURLS, " records slurped in.\n"; } sub parseURL { my ($T, $ADEP) = @_; #print Dumper($T) ."," . Dumper($ADEP) ."\n"; my $NAME= $ADEP->att('name'); print $ADEP->first_child('application')->text() . "\n"; #print "$NAME\n"; for my $URLI ($ADEP->first_child('application')->children('urlInfo +')) { # for my $URLI ($ADEP->children('application')) { # $NAME2 = $ADEP->att('name'); #print "$NAME2\n"; # leading slash added for matching SM filters $AFURLS{ "/" . $URLI->first_child('url')->text() } = $NAME; # $AFURLS{ "/" . $URLI->first_child('urlInfo')->children('url') +->text() } = $NAME . "-" . $NAME2; } } # # Main program START # AFXMLtoEM(); print Dumper(\%AFURLS);
but that just gets me a hash with <url> as the key and my munged data in the value. It also doesn't work for the second (or any additional beyond the first) set of <application> tags. I need ALL of the data in a structure that's use-able for post processing. Even if I need to output it all to a CSV file and then reparse every entry at this point (45 hours and counting on this) line by line. I've used XML::Simple before, but this is more complex XML thna I ever parsed before for one, and it turns the <application> tag NAME into the attribute name, so no way to programmatic-ally get it. :( I am OK with adding the application value to the appDeploymentFile value and the url withe the OtherPrereq value and then handling appropriately in the hash later with a split or somesuch when I am ready to do things with it. Thanks in advance!

In reply to XML::Twig n00b by Binford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.