in reply to Re: Using XML Twig to summarize a large file
in thread Using XML Twig to summarize a large file

You are correct. I cut and pasted and then entered the populate sub. It is my understanding that twig sets up handlers that are called for each element in the xml when you go to parse the file. The XML I'm dealing with is structured in a highly peculiar way. There's a brief header with information that is irrelevant to What I'm using the data for. All remaining data is under a parent titled "Data." Under that parent are roughly 500 children, each of which is a product with roughly 300 properties setup as children of it's own. The problem for me is that those properties aren't uniform. 60 items may have a listing for "number of pages" while others will have "number of tracks." Each item is massive, so here's a brief snippet.
<is:ItemMaster> <is:ItemMasterHeader> <oi:ItemID agencyRole="Product_Number">some_number</oi:ItemID> <oi:ItemID agencyRole="Prefix_Number">some_number</oi:ItemID> <oi:ItemID agencyRole="Stock_Number">some_number</oi:ItemID> <oi:ManufacturerItemID>some_manufacturer_ID</oi:ManufacturerID> <is:Classification type="Group"></is:Classification> <is:Classification></is:Classification>
Each of these ItemMasters has around eight children and the children have anywhere from one to twenty-four children. Because the children are not uniform this is giving me headaches. Here's my first revision
#!/bin/perl use XML::Twig; %Items=(); my $twig=XML::Twig->new( twig_handlers => {populate=> sub { while (<>) { if (%Items !~ m/"<us:"|"<oa:"(.*)/) { $Items{$1} =1} else {$Items{$2} =($Items{$1}+(/$1/)) } }; #If element is not in the hash, adds it }, #If element is in the hash, adds the number of matches div => sub { $_[0]->purge; }, # free memory }, ); $twig->parsefile( '500syncItemMaster.xml'); # build it $twig->purge; # clear end of document from memory print %Items; # output the twig
Now when I print I get nothing. I tried a test run and it seems like the handlers are not getting called at all.

Replies are listed 'Best First'.
Re^3: Using XML Twig to summarize a large file
by mirod (Canon) on Nov 07, 2007 at 16:10 UTC

    A handler is called when the associated expression triggers it, so what you wrote triggers a handler on every populate element. I don't see any element by that name in the XML, so the handler will not be called. Is there anything wrong with the pyx code I posted below? Or any specific reason why you would want to use XML::Twig despite it not being the most suited for the task?

      Thanks! The code you posted gave me the following error.
      String found where operator expected at line 2 near "pyx 'bigXML.xml | + perl -n -e'" Scalar found where operator expected at line 2 near "BigXML.xml | perl + -n -e '$nb" (missing operator before $nb?)
      I'm sort of learning Perl crash course style and wanted to understand why what I wrote bombed so dismally. Twig is mildly complicated and it seemed like getting a handle on a complex module might be a good way (though challenging) to jump right in. I got handed a Camel book and told "Go learn this. We've got faith in you." Exciting and intimidating at the same time. I'll take any help I can get. Is this a bad way to go about learning Perl?

        What you wrote bombed because frankly I don't think you have understood much of the docs for XML::Twig (which is probably understandable if you don't know any Perl). There are much easier to start with Perl than starting with a complex module, especially when using it for something it's not really made for.

        As for the problem with pyx it seems that a cut'n paste did not quite work as planned, so here is an hopefully better version:

        pyx bigXML.xml | perl -n -e '$nb{$1}++ if( m/\A\((.*)\n/); END { map { print "$_ used $nb{$_} time(s)\n";} sort keys %nb;}'

        This works on *nix, windows would probably require different quotes. If you want to understand what's going on you can use perldoc XML::PYX, for info about PYX, perldoc perlrun to understand the -n and -e options, perldoc -f map and perldoc -f sort.