Many of the approaches in this thread centered around using XML::Simple. Why not try using XML::SAX and build your own SAX event handler. I believe it can satify your requirements while at the same time providing more flexibility than XML::Parser's interface.

A good introduction to creating SAX event handlers can be found at XML::SAX::Intro in the XML::SAX distribution on CPAN.

To address you're question here's a working example:

#!/usr/bin/perl -wT use strict; use XML::SAX; use Data::Dumper qw(DumperX); my $handler = My::SAXParser->new; my $parser = XML::SAX::ParserFactory->parser(Handler => $handler); #pass the XML document at the bottom __DATA__ tag to the parser $parser->parse_string(do { local $/; <DATA> }); print DumperX($handler->nodes); { #this class keeps track of the processed nodes package My::SAXParser; use strict; use base qw(XML::SAX::Base); use Class::MethodMaker get_set => ['nodes'], list => ['element_stack']; use constant SKIP_NODE => 'xml'; sub start_document { shift->nodes({}) } sub start_element { my $self = shift; my $el = shift; return if $el->{Name} eq SKIP_NODE; #make note of which element we are processing - in the stack $self->element_stack_push(\my %element); foreach my $attribute (values %{$el->{Attributes}}) { push @{$element{attributes}}, @$attribute{qw(Name Value)}; } #keep track of all interesting element nodes push @{ $self->nodes->{$el->{Name}} }, \%element; return $self->SUPER::start_element($el); } sub characters { my $self = shift; return unless $self->element_stack_count; #are there any pending +element nodes to process? return $self->SUPER::characters($self->element_stack->[-1]->{text} + .= shift->{Data}); } sub end_element { my $self = shift; $self->element_stack_pop; #element has been processed, pop it off + the stack return $self->SUPER::end_element(shift); } } __DATA__ <xml> <requirement contactname="Joe Average">A power cord.</requirement> <requirement contactname="Jane Smith" contactnumber="555-1212">A node +name</requirement> </xml>

This should produce the following output:

$VAR1 = { 'requirement' => [ { 'text' => 'A power cord.', 'attributes' => [ 'contactname', 'Joe Average' ] }, { 'text' => 'A node name', 'attributes' => [ 'contactnumber', '555-1212', 'contactname', 'Jane Smith' ] } ] };

I tested this code with the other XML document example you posted in this thread. It can parse it and I believe it produces a pretty reasonable output.

Also if performance is an issue it's possible to gain further speed increases using XML::LibXML::SAX::Parser or XML::SAX::Expat. Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code: the use and new constructor statements.


In reply to (dkubb) Re: (2) XML parsing and SAX event handlers by dkubb
in thread Predefining complex data structures? by Ionizor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.