in reply to Using XML Twig to summarize a large file

Your code looks to me like you've copied a big portion from perldoc XML::Twig without understanding it. I don't belive thoes "title" and "para" handlers are needed for your specific XML.

I'm also more than 99.99% sure that XML::Twig will do what you need. Just show us a snippet of your XML in question and we might be able to help you. This code is, at least for me, in no way helpful.


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re^2: Using XML Twig to summarize a large file
by Anonymous Monk on Nov 07, 2007 at 14:58 UTC
    You are correct. I cut and pasted and then entered the populate sub. It is my understanding that twig sets up handlers that are called for each element in the xml when you go to parse the file. The XML I'm dealing with is structured in a highly peculiar way. There's a brief header with information that is irrelevant to What I'm using the data for. All remaining data is under a parent titled "Data." Under that parent are roughly 500 children, each of which is a product with roughly 300 properties setup as children of it's own. The problem for me is that those properties aren't uniform. 60 items may have a listing for "number of pages" while others will have "number of tracks." Each item is massive, so here's a brief snippet.
    <is:ItemMaster> <is:ItemMasterHeader> <oi:ItemID agencyRole="Product_Number">some_number</oi:ItemID> <oi:ItemID agencyRole="Prefix_Number">some_number</oi:ItemID> <oi:ItemID agencyRole="Stock_Number">some_number</oi:ItemID> <oi:ManufacturerItemID>some_manufacturer_ID</oi:ManufacturerID> <is:Classification type="Group"></is:Classification> <is:Classification></is:Classification>
    Each of these ItemMasters has around eight children and the children have anywhere from one to twenty-four children. Because the children are not uniform this is giving me headaches. Here's my first revision
    #!/bin/perl use XML::Twig; %Items=(); my $twig=XML::Twig->new( twig_handlers => {populate=> sub { while (<>) { if (%Items !~ m/"<us:"|"<oa:"(.*)/) { $Items{$1} =1} else {$Items{$2} =($Items{$1}+(/$1/)) } }; #If element is not in the hash, adds it }, #If element is in the hash, adds the number of matches div => sub { $_[0]->purge; }, # free memory }, ); $twig->parsefile( '500syncItemMaster.xml'); # build it $twig->purge; # clear end of document from memory print %Items; # output the twig
    Now when I print I get nothing. I tried a test run and it seems like the handlers are not getting called at all.

      A handler is called when the associated expression triggers it, so what you wrote triggers a handler on every populate element. I don't see any element by that name in the XML, so the handler will not be called. Is there anything wrong with the pyx code I posted below? Or any specific reason why you would want to use XML::Twig despite it not being the most suited for the task?

        Thanks! The code you posted gave me the following error.
        String found where operator expected at line 2 near "pyx 'bigXML.xml | + perl -n -e'" Scalar found where operator expected at line 2 near "BigXML.xml | perl + -n -e '$nb" (missing operator before $nb?)
        I'm sort of learning Perl crash course style and wanted to understand why what I wrote bombed so dismally. Twig is mildly complicated and it seemed like getting a handle on a complex module might be a good way (though challenging) to jump right in. I got handed a Camel book and told "Go learn this. We've got faith in you." Exciting and intimidating at the same time. I'll take any help I can get. Is this a bad way to go about learning Perl?