Many of the approaches in this thread centered around using XML::Simple. Why not try using XML::SAX and build your own SAX event handler. I believe it can satify your requirements while at the same time providing more flexibility than XML::Parser's interface.
A good introduction to creating SAX event handlers can be found at XML::SAX::Intro in the XML::SAX distribution on CPAN.
To address you're question here's a working example:
#!/usr/bin/perl -wT
use strict;
use XML::SAX;
use Data::Dumper qw(DumperX);
my $handler = My::SAXParser->new;
my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);
#pass the XML document at the bottom __DATA__ tag to the parser
$parser->parse_string(do { local $/; <DATA> });
print DumperX($handler->nodes);
{
#this class keeps track of the processed nodes
package My::SAXParser;
use strict;
use base qw(XML::SAX::Base);
use Class::MethodMaker
get_set => ['nodes'],
list => ['element_stack'];
use constant SKIP_NODE => 'xml';
sub start_document { shift->nodes({}) }
sub start_element {
my $self = shift;
my $el = shift;
return if $el->{Name} eq SKIP_NODE;
#make note of which element we are processing - in the stack
$self->element_stack_push(\my %element);
foreach my $attribute (values %{$el->{Attributes}}) {
push @{$element{attributes}}, @$attribute{qw(Name Value)};
}
#keep track of all interesting element nodes
push @{ $self->nodes->{$el->{Name}} }, \%element;
return $self->SUPER::start_element($el);
}
sub characters {
my $self = shift;
return unless $self->element_stack_count; #are there any pending
+element nodes to process?
return $self->SUPER::characters($self->element_stack->[-1]->{text}
+ .= shift->{Data});
}
sub end_element {
my $self = shift;
$self->element_stack_pop; #element has been processed, pop it off
+ the stack
return $self->SUPER::end_element(shift);
}
}
__DATA__
<xml>
<requirement contactname="Joe Average">A power cord.</requirement>
<requirement contactname="Jane Smith" contactnumber="555-1212">A node
+name</requirement>
</xml>
This should produce the following output:
$VAR1 = {
'requirement' => [
{
'text' => 'A power cord.',
'attributes' => [
'contactname',
'Joe Average'
]
},
{
'text' => 'A node name',
'attributes' => [
'contactnumber',
'555-1212',
'contactname',
'Jane Smith'
]
}
]
};
I tested this code with the other XML document example you posted in this thread. It can parse it and I believe it produces a pretty reasonable output.
Also if performance is an issue it's possible to gain further speed increases using XML::LibXML::SAX::Parser or XML::SAX::Expat. Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code: the use and new constructor statements.