in reply to XML::Twig blues
I was able to run the code on the sample data provided in the OP, and got the same actual output as shown in the OP, but I think that your presentation of the expected/desired output is not quite consistent with the sample data. At least, if there is a way to get from the OP xml sample to the expected output as posted, it's quite unclear how that could happen.
Probably the core of the problem is that you assume an array called @fexassembly gets freshly created every time "sub fileexists" is invoked, but since you have this array being used via closure in several other subroutines, I think those subs are using just the first instance of this array every time they get called. This is related to the "harmless warnings" you mentioned, which say things like "Variable "@fexassembly" will not stay shared at itpslc.pl line ..."
In general, looking over the 1300+ lines of code, I think you're making this much harder than it needs to be. If the goal is to shove xml data into a DB table, you just need to decide which xml elements will constitute distinct rows, and set up your parser handlers so that you do a single insert every time you hit the end tags of appropriate elements.
XML::Twig is already creating a (rather massive, bulky) data structure for you from the xml, so you shouldn't need to build additional structures containing the same data. You should also be able to do at least some amount of abstraction or generalization regarding the transform from xml elements to DB table rows. You shouldn't need a separate subroutine for every distinct low-level tag, since you end up doing basically the same operations for all of them.
Actually, the fact that you have my $twig = new XML::Twig::XPath( TwigHandlers => ... in nine different places in your code probably indicates that there is a basic misunderstanding about how the task as a whole should be addressed.
Bottom line: you seem to be doing so many things wrong here that you're probably better off ditching this attempt and starting over from scratch. Start with a description of the task that's as simple and direct as possible -- something like 'for every element of type ..., insert a row into table ...' If you lay out the plan for mapping container elements to rows, and their contained elements to columns, you'll find ways to generalize across the details. (Whether you actually connect to a DB and do inserts, or simply print SQL statements as output is your choice.)
XPath might not be the best tool; a basic parser that fills slots in a data structure and inserts each completed structure to the database is what you want, I'm guessing.
(updated to fix typo in next-to-last paragraph)
Another update: Here's a simple approach using XML::LibXML; the same can presumably be done with other parsers, but maybe not so compactly (both in lines of code and in memory footprint)... The output might not be exactly what you were aiming for, but I think it's close enough that the differences are trivial.
Obviously you'll want to add stuff to cover lots of other issues, but if you follow the basic approach, you should end up with far fewer than 1300 lines of code, and it'll be a lot easier to maintain. (updated one last time to fix code tags)#!/usr/bin/perl use strict; use warnings; use XML::LibXML; use Data::Dumper 'Dumper'; my $xml = XML::LibXML->new(); my $doc = $xml->parse_file( "j.xml" ); my $pth = XML::LibXML::XPathContext->new( $doc ); my @subblocks; for my $sbnode ( $pth->findnodes( "/itpslsig/itpslbody/mainblock/subbl +ock" )) { for my $sbchild ( $sbnode->childNodes ) { next unless ( $sbchild->nodeName =~ /fileexists|userexec/ ); my %feitem = ( STARTOP => $sbchild->nodeName ); for my $fechild ( $sbchild->childNodes ) { $feitem{$fechild->nodeName} = $fechild->textContent; } my @features = map { "$_ => $feitem{$_}" } sort grep /^\w/, ke +ys %feitem; push @subblocks, [ @features ]; } } print Dumper( \@subblocks );
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: XML::Twig blues
by gmagklaras (Initiate) on Jun 29, 2011 at 00:07 UTC | |
by graff (Chancellor) on Jun 29, 2011 at 01:26 UTC |