help on how to get information from XML file using XML::Twig requested

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

After discussion here with you guys in my last thread, I've been teaching myself to get information from a xml document using XML::Twig. A portion of the xml file is here

<sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww
+w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche
+ma test.xsd">
<features>
<feature>
<feature_name>feature_one</feature_name>
<residues>
<residue>
<residue_index>2</residue_index>
<residue_name>R</residue_name>
</residue>
<residue>
<residue_index>4</residue_index>
<residue_name>V</residue_name>
</residue>
</residues>
</feature>
<feature>
<feature_name>feature_two</feature_name>
<residues>
<residue>
<residue_index>5</residue_index>
<residue_name>S</residue_name>
</residue>
</residues>
</feature>
</features>
[download]

So far, I've managed to print out the 'residue_index' and 'residue_name' info from within the 'residue' tag. But I need to expand upon that and only print this information for 'feature_one' and not for 'feature_two'- at the moment its printing the information regardless of what feature 'residue_name' and 'residue_index' is in.

Any advice as to how I may achieve this much appreciated.

Code thus far:


#!/usr/bin/perl

# use module

use Data::Dumper;
use strict;
use warnings;
use XML::Twig;

my $file = shift;

my $twig= new XML::Twig(
    twig_handlers => { residue => \&residue }
);

$twig->parsefile($file);

sub residue
{
    my ($twig, $res) = @_;

    my $res_idx = $res->first_child('residue_index')->text;
    my $res_name = $res->first_child('residue_name')->text;

    print "$res_idx $res_name\n";
      
}
[download]

Comment on help on how to get information from XML file using XML::Twig requested Select or Download Code

Replies are listed 'Best First'.
Re: help on how to get information from XML file using XML::Twig requested by toolic (Bishop) on Aug 10, 2009 at 13:10 UTC
Change your handler to look for 'feature' elements, and use xpaths to scope down to your 'residue' elements: my $xfile = <<EOF; <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> <features> <feature> <feature_name>feature_one</feature_name> <residues> <residue> <residue_index>2</residue_index> <residue_name>R</residue_name> </residue> <residue> <residue_index>4</residue_index> <residue_name>V</residue_name> </residue> </residues> </feature> <feature> <feature_name>feature_two</feature_name> <residues> <residue> <residue_index>5</residue_index> <residue_name>S</residue_name> </residue> </residues> </feature> </features> EOF use strict; use warnings; use XML::Twig; my $twig= new XML::Twig( twig_handlers => { feature => \&feature } ); $twig->parse($xfile); sub feature { my ($twig, $feat) = @_; if ($feat->first_child('feature_name')->text() eq 'feature_one') { for my $res ($feat->findnodes('residues/residue')) { my $res_idx = $res->first_child('residue_index')->text(); my $res_name = $res->first_child('residue_name' )->text(); print "$res_idx $res_name\n"; } } } __END__ 2 R 4 V [download]	[reply] [d/l]
Re: help on how to get information from XML file using XML::Twig requested by Jenda (Abbot) on Aug 10, 2009 at 14:07 UTC
use strict; use XML::Rules; my $parser = XML::Rules->new( rules => { _default => 'content', 'residue' => sub { my ($tag,$attr,$context,$parents) = @_; if ($context->[-1] eq 'residues' and $context->[-2] eq 'fe +ature' and $parents->[-2]{feature_name} eq 'feature_one') { print "$attr->{residue_index} $attr->{residue_name}\n" } }, } ); $parser->parse(\DATA); __DATA__ <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> ... [download] or use strict; use XML::Rules; my $parser = XML::Rules->new( rules => { _default => 'content', '^residues' => sub { my ($tag,$attr,$context,$parents) = @_; return ($context->[-1] eq 'feature' and $parents->[-1]{fea +ture_name} eq 'feature_one'); }, 'residue' => sub { my ($tag,$attr) = @_; print "$attr->{residue_index} $attr->{residue_name}\n" }, } ); $parser->parse(\DATA); __DATA__ <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> ... [download] The first version checks the parent and parent's parent tag name and the content of the <feature_name> tag whenever the <residue> tag is fully parsed, the second checks the <feature_name> whenever the opening tag <residues> is found and skips its contents if the <feature_name> is not the one you are looking for. Jenda Enoch was right! Enjoy the last years of Rome.	[reply] [d/l] [select]