mecrazycoder has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning Monks, I know this question is already asked in this forum but i cant able to understand. Here is my code
sub start { @tempArray=0; #our $filename=shift; our $parser = new XML::DOM::Parser; our $doc = $parser->parsefile("1597976.doc001.out.xml") or die "Un +able to parse document"; our $root = $doc->getDocumentElement(); valueIntoArray($root); } sub valueIntoArray #Placing values alone into array { my ($rt)= @_; foreach my $node ( $rt->getChildNodes()) { if (($node->getNodeType() == TEXT_NODE ) && ($node->getData()= +~ /\S/s)) { #push(@tempArray,lc($node->getData())); print $node->getData()."\n"; } valueIntoArray($node); } #return sort(@tempArray); }
My goal is to retrieve all the value from XML file excluding tags.It works well.But now i want retrieve values from particular area.Consider the XML file is like
<Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Major>xxx</Major> </Education> </Person>
.Here i want to retrieve only values under Education tag.How can i do that.Education tags contain various tags depending upon input.So i cant go by tag name.Please help me

Replies are listed 'Best First'.
Re: Regarding XML::DOM::Parser
by tmharish (Friar) on Sep 16, 2009 at 15:11 UTC
    I find that when I have data that is unpredictable (such as is the case here with respect to not knowing the potential nodes that might come up) its better to parse it from scratch.

    I might be totally wrong but I am not sure if the XML you have described is well formed as is required.

    Either way here is a hopefully readable but not the most efficient peice of code that achieves what you are aiming to do:

    use strict; use warnings; use Data::Dump qw( dump ); my $data = do{local $/;<DATA>}; my @people_educations; while( $data =~ m/<person>(.*?)<\/person>/gis ) { my $one_persons_info = $1; while( $one_persons_info =~ m/<education>(.*?)<\/education>/gis ) +{ my $this_guys_education_details_string = $1 ; my %this_guys_education_details_hash ; my @this_guys_education_sections ; while( $this_guys_education_details_string =~ /<([^\/]*?)>/gis ) { push @this_guys_education_sections, $1 ; } foreach my $single_section ( @this_guys_education_sections ) { if( $this_guys_education_details_string =~ /<$single_section>( +.*?)<\/$single_section>/gis ) { $this_guys_education_details_hash{ $single_section } = $1; } } push @people_educations, \%this_guys_education_details_hash; } } dump( \@people_educations ); __DATA__ <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Major>xxx</Major> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Minor>xxx</Minor> <Grade> ggg </Grade> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> </Education> </Person>


    This will produce the following output:

    [ { Degree => "xxx", Major => "xxx" }, { Degree => "xxx", Grade => " ggg ", Minor => "xxx" }, { Degree => "xxx" }, ]




      Please don't!

      Please do NOT attempt to parse XML with regexps! It's fragile and prone to errors! Did you unescape the data? Did you handle <!CDATA ... ]> sections and comments?

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Re: Regarding XML::DOM::Parser
by Jenda (Abbot) on Sep 17, 2009 at 14:23 UTC

    You mean you want all the text from any tags in any structure under the <Education> tag? That sounds rather strange, but would be best handled by a push parser. Something like XML::Parser where you specify a subroutine to be called for start tags that will check the tag name and increment a global flag if it's Education, another to be called for text that'll push the text into a global array if the flag is set and last one for the closing tags that'll check whether the tag name is Education and decrement the flag.

    Most efficient and in this case actually fairly simple.

    On the other hand ... what are you REALY trying to do? We know this one step, but maybe if we see the bigger picture (know the task in which this is just a step) we can give a better advice.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.