comment on

I followed the link to your svn repository. It's good that you decided not to post all 1300+ lines of code here, but it's sad that you also decided not to boil things down to a minimal snippet of code to demonstrate the problem.

I was able to run the code on the sample data provided in the OP, and got the same actual output as shown in the OP, but I think that your presentation of the expected/desired output is not quite consistent with the sample data. At least, if there is a way to get from the OP xml sample to the expected output as posted, it's quite unclear how that could happen.

Probably the core of the problem is that you assume an array called @fexassembly gets freshly created every time "sub fileexists" is invoked, but since you have this array being used via closure in several other subroutines, I think those subs are using just the first instance of this array every time they get called. This is related to the "harmless warnings" you mentioned, which say things like "Variable "@fexassembly" will not stay shared at itpslc.pl line ..."

In general, looking over the 1300+ lines of code, I think you're making this much harder than it needs to be. If the goal is to shove xml data into a DB table, you just need to decide which xml elements will constitute distinct rows, and set up your parser handlers so that you do a single insert every time you hit the end tags of appropriate elements.

XML::Twig is already creating a (rather massive, bulky) data structure for you from the xml, so you shouldn't need to build additional structures containing the same data. You should also be able to do at least some amount of abstraction or generalization regarding the transform from xml elements to DB table rows. You shouldn't need a separate subroutine for every distinct low-level tag, since you end up doing basically the same operations for all of them.

Actually, the fact that you have my $twig = new XML::Twig::XPath( TwigHandlers => ... in nine different places in your code probably indicates that there is a basic misunderstanding about how the task as a whole should be addressed.

Bottom line: you seem to be doing so many things wrong here that you're probably better off ditching this attempt and starting over from scratch. Start with a description of the task that's as simple and direct as possible -- something like 'for every element of type ..., insert a row into table ...' If you lay out the plan for mapping container elements to rows, and their contained elements to columns, you'll find ways to generalize across the details. (Whether you actually connect to a DB and do inserts, or simply print SQL statements as output is your choice.)

XPath might not be the best tool; a basic parser that fills slots in a data structure and inserts each completed structure to the database is what you want, I'm guessing.

(updated to fix typo in next-to-last paragraph)

Another update: Here's a simple approach using XML::LibXML; the same can presumably be done with other parsers, but maybe not so compactly (both in lines of code and in memory footprint)... The output might not be exactly what you were aiming for, but I think it's close enough that the differences are trivial.

#!/usr/bin/perl

use strict;
use warnings;
use XML::LibXML;
use Data::Dumper 'Dumper';

my $xml = XML::LibXML->new();
my $doc = $xml->parse_file( "j.xml" );
my $pth = XML::LibXML::XPathContext->new( $doc );

my @subblocks;
for my $sbnode ( $pth->findnodes( "/itpslsig/itpslbody/mainblock/subbl
+ock" )) {
    for my $sbchild ( $sbnode->childNodes ) {
        next unless ( $sbchild->nodeName =~ /fileexists|userexec/ );
        my %feitem = ( STARTOP => $sbchild->nodeName );
        for my $fechild ( $sbchild->childNodes ) {
            $feitem{$fechild->nodeName} = $fechild->textContent;
        }
        my @features = map { "$_ => $feitem{$_}" } sort grep /^\w/, ke
+ys %feitem;
        push @subblocks, [ @features ];
    }
}

print Dumper( \@subblocks );
[download]

Obviously you'll want to add stuff to cover lots of other issues, but if you follow the basic approach, you should end up with far fewer than 1300 lines of code, and it'll be a lot easier to maintain. (updated one last time to fix code tags)

In reply to Re: XML::Twig blues by graff
in thread XML::Twig blues by gmagklaras

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.