Angharad has asked for the wisdom of the Perl Monks concerning the following question:

After discussion here with you guys in my last thread, I've been teaching myself to get information from a xml document using XML::Twig. A portion of the xml file is here
<sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> <features> <feature> <feature_name>feature_one</feature_name> <residues> <residue> <residue_index>2</residue_index> <residue_name>R</residue_name> </residue> <residue> <residue_index>4</residue_index> <residue_name>V</residue_name> </residue> </residues> </feature> <feature> <feature_name>feature_two</feature_name> <residues> <residue> <residue_index>5</residue_index> <residue_name>S</residue_name> </residue> </residues> </feature> </features>
So far, I've managed to print out the 'residue_index' and 'residue_name' info from within the 'residue' tag. But I need to expand upon that and only print this information for 'feature_one' and not for 'feature_two'- at the moment its printing the information regardless of what feature 'residue_name' and 'residue_index' is in.

Any advice as to how I may achieve this much appreciated.

Code thus far:

#!/usr/bin/perl # use module use Data::Dumper; use strict; use warnings; use XML::Twig; my $file = shift; my $twig= new XML::Twig( twig_handlers => { residue => \&residue } ); $twig->parsefile($file); sub residue { my ($twig, $res) = @_; my $res_idx = $res->first_child('residue_index')->text; my $res_name = $res->first_child('residue_name')->text; print "$res_idx $res_name\n"; }

Replies are listed 'Best First'.
Re: help on how to get information from XML file using XML::Twig requested
by toolic (Bishop) on Aug 10, 2009 at 13:10 UTC
    Change your handler to look for 'feature' elements, and use xpaths to scope down to your 'residue' elements:
    my $xfile = <<EOF; <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> <features> <feature> <feature_name>feature_one</feature_name> <residues> <residue> <residue_index>2</residue_index> <residue_name>R</residue_name> </residue> <residue> <residue_index>4</residue_index> <residue_name>V</residue_name> </residue> </residues> </feature> <feature> <feature_name>feature_two</feature_name> <residues> <residue> <residue_index>5</residue_index> <residue_name>S</residue_name> </residue> </residues> </feature> </features> EOF use strict; use warnings; use XML::Twig; my $twig= new XML::Twig( twig_handlers => { feature => \&feature } ); $twig->parse($xfile); sub feature { my ($twig, $feat) = @_; if ($feat->first_child('feature_name')->text() eq 'feature_one') { for my $res ($feat->findnodes('residues/residue')) { my $res_idx = $res->first_child('residue_index')->text(); my $res_name = $res->first_child('residue_name' )->text(); print "$res_idx $res_name\n"; } } } __END__ 2 R 4 V
Re: help on how to get information from XML file using XML::Twig requested
by Jenda (Abbot) on Aug 10, 2009 at 14:07 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new( rules => { _default => 'content', 'residue' => sub { my ($tag,$attr,$context,$parents) = @_; if ($context->[-1] eq 'residues' and $context->[-2] eq 'fe +ature' and $parents->[-2]{feature_name} eq 'feature_one') { print "$attr->{residue_index} $attr->{residue_name}\n" } }, } ); $parser->parse(\*DATA); __DATA__ <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> ...
    or
    use strict; use XML::Rules; my $parser = XML::Rules->new( rules => { _default => 'content', '^residues' => sub { my ($tag,$attr,$context,$parents) = @_; return ($context->[-1] eq 'feature' and $parents->[-1]{fea +ture_name} eq 'feature_one'); }, 'residue' => sub { my ($tag,$attr) = @_; print "$attr->{residue_index} $attr->{residue_name}\n" }, } ); $parser->parse(\*DATA); __DATA__ <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma test.xsd"> ...

    The first version checks the parent and parent's parent tag name and the content of the <feature_name> tag whenever the <residue> tag is fully parsed, the second checks the <feature_name> whenever the opening tag <residues> is found and skips its contents if the <feature_name> is not the one you are looking for.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.