in reply to Predefining complex data structures?
Many of the approaches in this thread centered around using XML::Simple. Why not try using XML::SAX and build your own SAX event handler. I believe it can satify your requirements while at the same time providing more flexibility than XML::Parser's interface.
A good introduction to creating SAX event handlers can be found at XML::SAX::Intro in the XML::SAX distribution on CPAN.
To address you're question here's a working example:
#!/usr/bin/perl -wT use strict; use XML::SAX; use Data::Dumper qw(DumperX); my $handler = My::SAXParser->new; my $parser = XML::SAX::ParserFactory->parser(Handler => $handler); #pass the XML document at the bottom __DATA__ tag to the parser $parser->parse_string(do { local $/; <DATA> }); print DumperX($handler->nodes); { #this class keeps track of the processed nodes package My::SAXParser; use strict; use base qw(XML::SAX::Base); use Class::MethodMaker get_set => ['nodes'], list => ['element_stack']; use constant SKIP_NODE => 'xml'; sub start_document { shift->nodes({}) } sub start_element { my $self = shift; my $el = shift; return if $el->{Name} eq SKIP_NODE; #make note of which element we are processing - in the stack $self->element_stack_push(\my %element); foreach my $attribute (values %{$el->{Attributes}}) { push @{$element{attributes}}, @$attribute{qw(Name Value)}; } #keep track of all interesting element nodes push @{ $self->nodes->{$el->{Name}} }, \%element; return $self->SUPER::start_element($el); } sub characters { my $self = shift; return unless $self->element_stack_count; #are there any pending +element nodes to process? return $self->SUPER::characters($self->element_stack->[-1]->{text} + .= shift->{Data}); } sub end_element { my $self = shift; $self->element_stack_pop; #element has been processed, pop it off + the stack return $self->SUPER::end_element(shift); } } __DATA__ <xml> <requirement contactname="Joe Average">A power cord.</requirement> <requirement contactname="Jane Smith" contactnumber="555-1212">A node +name</requirement> </xml>
This should produce the following output:
$VAR1 = { 'requirement' => [ { 'text' => 'A power cord.', 'attributes' => [ 'contactname', 'Joe Average' ] }, { 'text' => 'A node name', 'attributes' => [ 'contactnumber', '555-1212', 'contactname', 'Jane Smith' ] } ] };
I tested this code with the other XML document example you posted in this thread. It can parse it and I believe it produces a pretty reasonable output.
Also if performance is an issue it's possible to gain further speed increases using XML::LibXML::SAX::Parser or XML::SAX::Expat. Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code: the use and new constructor statements.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: (4) XML parsing and SAX event handlers
by grantm (Parson) on Jul 15, 2002 at 08:16 UTC | |
|
Re: XML parsing and SAX event handlers
by Ionizor (Pilgrim) on Aug 02, 2002 at 20:13 UTC |