Regarding XML::DOM::Parser

mecrazycoder has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning Monks, I know this question is already asked in this forum but i cant able to understand. Here is my code

sub start
{
    @tempArray=0;
    #our $filename=shift;
    our $parser = new XML::DOM::Parser;
    our $doc = $parser->parsefile("1597976.doc001.out.xml") or die "Un
+able to parse document";
    our $root = $doc->getDocumentElement(); 
    valueIntoArray($root);
}    

sub valueIntoArray #Placing values alone into array 
{
    my ($rt)= @_;
    foreach my $node ( $rt->getChildNodes())         
    {
        if (($node->getNodeType() == TEXT_NODE ) && ($node->getData()=
+~ /\S/s))          
        {
            #push(@tempArray,lc($node->getData()));
            print $node->getData()."\n";
        }
        valueIntoArray($node);
    }
    #return sort(@tempArray);
}
[download]

My goal is to retrieve all the value from XML file excluding tags.It works well.But now i want retrieve values from particular area.Consider the XML file is like

<Person>
<Address>
<name>xxx</name>
<mobile>xxx</mobile>
</Address>
<Education>
<Degree>xxx</Degree>
<Major>xxx</Major>
</Education>
</Person>
[download]

.Here i want to retrieve only values under Education tag.How can i do that.Education tags contain various tags depending upon input.So i cant go by tag name.Please help me

Comment on Regarding XML::DOM::Parser Select or Download Code

Replies are listed 'Best First'.
Re: Regarding XML::DOM::Parser by tmharish (Friar) on Sep 16, 2009 at 15:11 UTC
I find that when I have data that is unpredictable (such as is the case here with respect to not knowing the potential nodes that might come up) its better to parse it from scratch. I might be totally wrong but I am not sure if the XML you have described is well formed as is required. Either way here is a hopefully readable but not the most efficient peice of code that achieves what you are aiming to do: use strict; use warnings; use Data::Dump qw( dump ); my $data = do{local $/;<DATA>}; my @people_educations; while( $data =~ m/<person>(.?)<\/person>/gis ) { my $one_persons_info = $1; while( $one_persons_info =~ m/<education>(.?)<\/education>/gis ) +{ my $this_guys_education_details_string = $1 ; my %this_guys_education_details_hash ; my @this_guys_education_sections ; while( $this_guys_education_details_string =~ /<([^\/]?)>/gis ) { push @this_guys_education_sections, $1 ; } foreach my $single_section ( @this_guys_education_sections ) { if( $this_guys_education_details_string =~ /<$single_section>( +.?)<\/$single_section>/gis ) { $this_guys_education_details_hash{ $single_section } = $1; } } push @people_educations, \%this_guys_education_details_hash; } } dump( \@people_educations ); __DATA__ <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Major>xxx</Major> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> <Minor>xxx</Minor> <Grade> ggg </Grade> </Education> </Person> <Person> <Address> <name>xxx</name> <mobile>xxx</mobile> </Address> <Education> <Degree>xxx</Degree> </Education> </Person> [download] This will produce the following output: `[ { Degree => "xxx", Major => "xxx" }, { Degree => "xxx", Grade => " ggg ", Minor => "xxx" }, { Degree => "xxx" }, ]` [download]	[reply] [d/l] [select]
Re^2: Regarding XML::DOM::Parser by Jenda (Abbot) on Sep 17, 2009 at 14:18 UTC
Please don't! Please do NOT attempt to parse XML with regexps! It's fragile and prone to errors! Did you unescape the data? Did you handle <!CDATA ... ]> sections and comments? Jenda Enoch was right! Enjoy the last years of Rome.	[reply]
Re: Regarding XML::DOM::Parser by Jenda (Abbot) on Sep 17, 2009 at 14:23 UTC
You mean you want all the text from any tags in any structure under the <Education> tag? That sounds rather strange, but would be best handled by a push parser. Something like XML::Parser where you specify a subroutine to be called for start tags that will check the tag name and increment a global flag if it's Education, another to be called for text that'll push the text into a global array if the flag is set and last one for the closing tags that'll check whether the tag name is Education and decrement the flag. Most efficient and in this case actually fairly simple. On the other hand ... what are you REALY trying to do? We know this one step, but maybe if we see the bigger picture (know the task in which this is just a step) we can give a better advice. Jenda Enoch was right! Enjoy the last years of Rome.	[reply]