Re^2: Using XML Twig to summarize a large file

You are correct. I cut and pasted and then entered the populate sub. It is my understanding that twig sets up handlers that are called for each element in the xml when you go to parse the file. The XML I'm dealing with is structured in a highly peculiar way. There's a brief header with information that is irrelevant to What I'm using the data for. All remaining data is under a parent titled "Data." Under that parent are roughly 500 children, each of which is a product with roughly 300 properties setup as children of it's own. The problem for me is that those properties aren't uniform. 60 items may have a listing for "number of pages" while others will have "number of tracks." Each item is massive, so here's a brief snippet.

<is:ItemMaster>
  <is:ItemMasterHeader>
    <oi:ItemID agencyRole="Product_Number">some_number</oi:ItemID>
    <oi:ItemID agencyRole="Prefix_Number">some_number</oi:ItemID>
    <oi:ItemID agencyRole="Stock_Number">some_number</oi:ItemID>
    <oi:ManufacturerItemID>some_manufacturer_ID</oi:ManufacturerID>
    <is:Classification type="Group"></is:Classification>
    <is:Classification></is:Classification>
[download]

Each of these ItemMasters has around eight children and the children have anywhere from one to twenty-four children. Because the children are not uniform this is giving me headaches. Here's my first revision

#!/bin/perl
use XML::Twig;
%Items=();

my $twig=XML::Twig->new(
    twig_handlers => 
      {populate=> sub { while (<>)
{ if (%Items !~ m/"<us:"|"<oa:"(.*)/) { $Items{$1} =1}   
  else {$Items{$2}    =($Items{$1}+(/$1/))    }             
};   #If element is not in the hash, adds it
},   #If element is in the hash, adds the number of matches 
                         
    div     => sub { $_[0]->purge;     }, # free memory
      },
                       );
   $twig->parsefile( '500syncItemMaster.xml'); # build it
   $twig->purge;                  # clear end of document from memory
 print %Items;                 # output the twig
[download]

Now when I print I get nothing. I tried a test run and it seems like the handlers are not getting called at all.

Comment on Re^2: Using XML Twig to summarize a large file Select or Download Code

Replies are listed 'Best First'.
Re^3: Using XML Twig to summarize a large file by mirod (Canon) on Nov 07, 2007 at 16:10 UTC
A handler is called when the associated expression triggers it, so what you wrote triggers a handler on every `populate` element. I don't see any element by that name in the XML, so the handler will not be called. Is there anything wrong with the `pyx` code I posted below? Or any specific reason why you would want to use XML::Twig despite it not being the most suited for the task?	[reply]
Re^4: Using XML Twig to summarize a large file by Mr.Churka (Sexton) on Nov 07, 2007 at 17:03 UTC
Thanks! The code you posted gave me the following error. `String found where operator expected at line 2 near "pyx 'bigXML.xml \| + perl -n -e'" Scalar found where operator expected at line 2 near "BigXML.xml \| perl + -n -e '$nb" (missing operator before $nb?)` [download] I'm sort of learning Perl crash course style and wanted to understand why what I wrote bombed so dismally. Twig is mildly complicated and it seemed like getting a handle on a complex module might be a good way (though challenging) to jump right in. I got handed a Camel book and told "Go learn this. We've got faith in you." Exciting and intimidating at the same time. I'll take any help I can get. Is this a bad way to go about learning Perl?	[reply] [d/l]
Re^5: Using XML Twig to summarize a large file by mirod (Canon) on Nov 07, 2007 at 17:40 UTC
What you wrote bombed because frankly I don't think you have understood much of the docs for XML::Twig (which is probably understandable if you don't know any Perl). There are much easier to start with Perl than starting with a complex module, especially when using it for something it's not really made for. As for the problem with `pyx` it seems that a cut'n paste did not quite work as planned, so here is an hopefully better version: `pyx bigXML.xml \| perl -n -e '$nb{$1}++ if( m/\A\((.)\n/); END { map { print "$_ used $nb{$_} time(s)\n";} sort keys %nb;}'` This works on nix, windows would probably require different quotes. If you want to understand what's going on you can use `perldoc XML::PYX`, for info about PYX, `perldoc perlrun` to understand the `-n` and `-e` options, `perldoc -f map` and `perldoc -f sort`.	[reply] [d/l]