I have successfully written a program that parses a large XML document using twig. The XML file is 500 products with variable attributes. One might be a table with certain dimensions. Another might be a personal organizer with a weekly calendar and pockets for a calculator/cellphone etc. In order to get an idea of what variety of tables I need to use in my database, I wrote the following to summarize the xml.
I used join to combine all the relevant data into a unique hash key and then set the value to the number of times that key occurs in the xml file. This gave me a nice breakdown of what unique items are in the xml. My problem is twofold.#!/bin/perl use strict; use warnings; use XML::Twig; use Tie::IxHash; my %Items; my $Output_Filehandle; tie %Items, "Tie::IxHash"; my $twig=XML::Twig->new( twig_handlers => {_all_ => sub {my $Item_master_Ancestory = $_->ancestors; my $element_match = ($_->tag); my $text = ($_->trimmed_text); my $coupled = join( ' - ' => " "x$Item_master_Ancestory, +$element_match,keys %{$_->atts},values %{$_->atts},$text); if (!defined $Items{$coupled}){$Items{$coupled}=1} else {$Items{$coupled}++;} }, } ); $twig->parsefile( '500syncItemMaster.xml'); # build it $twig->purge; # clear end of document from memory open(SUMMARY, ">United perl parser summary.txt"); my @k = keys %Items; foreach my $k (@k) {print SUMMARY ("$k => $Items{$k}\n");};
First, some of the elements are picking up the text from their children while others don't. This is very strange to me.
Second, the Tie module didn't keep the entire xml file ordered properly. for example
becomes:<Catalog> <item> <quantities>z <prices> <sellingpoints> <item> <quantities> <prices> <sellingpoints>
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |