gzayzay has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to collect a group of data from text file base on a couple of checks. Example: I have a text file that searched multiple xml file and extracted the following tags.

<MS_1> <loc>c:\data\cat.xml</loc> <words>dog, cat, fish, bird</words> </MS_1> <MS_2> <loc>c:\data\cow.xml</loc> <words>dog, cat, fish, bird, cow, goat</words> </MS_2> <MS_3> <loc>c:\data\snake.xml</loc> <words>dog, cat, fish, bird, snake, orange</words> </MS_3>

What I am trying to do is to prompt the user to enter the name of the animal to find. If the user enters snake for example, I want to return the entire content of <MS_3>..</MS_3> inculding the MS_ tags. I have tried the following code but I am getting nowhere. Could any monk kindly help me with this.

print ("Enter the word to search >> "); chomp ($word = <STDIN>); my $listen = 0; open(SEARCH, "<$kwords") while (<SEARCH>) { chomp; @array = (); foreach($_ =~ /\<MS_/){$listen++;} push (@array, $_) if /\<MS_$listen\>/../\<\/MS_$listen\>/; if ($_ =~ /$word/i) { print("@array[0..$#array]\n"); } }

Thanks,

Edman

Replies are listed 'Best First'.
Re: Printing a Tag content
by GrandFather (Saint) on Apr 03, 2006 at 20:43 UTC

    Use XML::Twig or XML::TreeBuilder. Life is not long enough to roll your own XML/HTML/CSV parser. To get you started:

    use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new(); $twig->parse (do {local $/; <DATA>}); my @chunks = $twig->root ()->children(); my %lookup; for my $chunk (@chunks) { my $words = $chunk->first_child ('words'); my $text = $words->xml_text (); push @{$lookup{$_}}, $chunk for split /,\s+/, $text; } #do stuff with %lookup __DATA__ <doc> <MS_1> <loc>c:\data\cat.xml</loc> <words>dog, cat, fish, bird</words> </MS_1> <MS_2> <loc>c:\data\cow.xml</loc> <words>dog, cat, fish, bird, cow, goat</words> </MS_2> <MS_3> <loc>c:\data\snake.xml</loc> <words>dog, cat, fish, bird, snake, orange</words> </MS_3> </doc>

    DWIM is Perl's answer to Gödel
      I knew someone would come up with the big XML libraries ;-)
      Update Forgive my teasing, I am programming perl mainly for the fun and the power of the language itself.
      Of course the best way is to use XML Libraries on XML problems.
Re: Printing a Tag content
by codeacrobat (Chaplain) on Apr 03, 2006 at 20:44 UTC
    You could either use a giant regex or use split with <MS_.> as pattern.
    The following code uses the split way.
    Use brackets to catch the opening tags.
    Stuff it in a %hash.
    Grep out the elem you wanted.
    Reassemble the record with map.
    Print it and be happy.
    #!/usr/bin/perl -w print ("Enter the word to search >> "); chomp ($word = <STDIN>); $/=undef; $_=<>; my @arr = split /(<MS_.>)/; shift @arr; # 1st element stuff before <MS_1>, so shift away my %hash = @arr; print map {($_, $hash{$_})} grep { $hash{$_} =~ $word } keys %hash;
      Thanks,

      Is this code of yours to go into the while loop of mine?? if note, the file with the tags is in a .txt file and I will need to open it and read the various lines. So, how do go about doing this.

      Edman

        No sorry. I was describing a way to read the xml from STDIN. If you want it using your $kwords file then use following version.
        open SEARCH, "<$kwords" or die "can't open $kwords $!\n"; { local $/=undef; $_ = <SEARCH>; my @arr = split /(<MS_.>)/; shift @arr; # 1st element stuff before <MS_1>, so shift away my %hash = @arr; print map {($_, $hash{$_})} grep { $hash{$_} =~ $word } keys %hash; } close SEARCH;
      Okies there is one more for the TIMTOWTDI and the golf fans.
      This is probably bad style but see yourself.
      #!/usr/bin/perl -w print ("Enter the word to search >> "); chomp ($word = <STDIN>); { local $/=undef; print grep { $_ =~ $word } <DATA> =~ m!(<MS_\d+>.*?</MS_\d+>)!gs; } __DATA__ <MS_1> <loc>c:\data\cat.xml</loc> <words>dog, cat, fish, bird</words> </MS_1> <MS_2> <loc>c:\data\cow.xml</loc> <words>dog, cat, fish, bird, cow, goat</words> </MS_2> <MS_3> <loc>c:\data\snake.xml</loc> <words>dog, cat, fish, bird, snake, orange</words> </MS_3>