in reply to Regarding XML::DOM::Parser

I find that when I have data that is unpredictable (such as is the case here with respect to not knowing the potential nodes that might come up) its better to parse it from scratch.

I might be totally wrong but I am not sure if the XML you have described is well formed as is required.

Either way here is a hopefully readable but not the most efficient peice of code that achieves what you are aiming to do:

use strict; use warnings; use Data::Dump qw( dump ); my $data = do{local $/;<DATA>}; my @people_educations; while( $data =~ m/<person>(.*?)<\/person>/gis ) { my $one_persons_info = $1; while( $one_persons_info =~ m/<education>(.*?)<\/education>/gis ) +{ my $this_guys_education_details_string = $1 ; my %this_guys_education_details_hash ; my @this_guys_education_sections ; while( $this_guys_education_details_string =~ /<([^\/]*?)>/gis ) { push @this_guys_education_sections, $1 ; } foreach my $single_section ( @this_guys_education_sections ) { if( $this_guys_education_details_string =~ /<$single_section>( +.*?)<\/$single_section>/gis ) { $this_guys_education_details_hash{ $single_section } = $1; } } push @people_educations, \%this_guys_education_details_hash; } } dump( \@people_educations ); __DATA__ <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Major>xxx</Major> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Minor>xxx</Minor> <Grade> ggg </Grade> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> </Education> </Person>


This will produce the following output:

[ { Degree => "xxx", Major => "xxx" }, { Degree => "xxx", Grade => " ggg ", Minor => "xxx" }, { Degree => "xxx" }, ]




Replies are listed 'Best First'.
Re^2: Regarding XML::DOM::Parser
by Jenda (Abbot) on Sep 17, 2009 at 14:18 UTC

    Please don't!

    Please do NOT attempt to parse XML with regexps! It's fragile and prone to errors! Did you unescape the data? Did you handle <!CDATA ... ]> sections and comments?

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.