in reply to Parse XML and compare with Fasta in Perl
Below is a little perl script that uses XML::LibXML and it's XPath abilities to provide a generic command-line method for extracting any specific content from an XML file, so long as you can provide the XPath syntax for the content you want. Given that script, the particular task stated in the OP can be accomplished with this command line (assuming the XML data has the required closing tag, as mentioned in a previous reply, and is stored in a file called "test.xml"):
There's a pretty good reference for XPath usage here: http://www.w3schools.com/XPath/default.asp. The code for my "exp" utility is pretty simple:exp -p "//info_name | //it_size" test.xml # output: FZGA34177.b1 35000 FZGA34178.b1 12000 FZGA34179.b1 7000 FZGA34180.b1 3000 FZGA34181.b1 7000
#!/usr/bin/perl use strict; use XML::LibXML; use Getopt::Long; binmode STDOUT,":utf8"; my $Usage = "Usage: $0 [-x] [-r] -p xpath_spec file.xml\n"; my %opt; die $Usage unless ( GetOptions( \%opt, 'x', 'r', 'p=s' ) and @ARGV == 1 and -f $ARGV[0] and $opt{p} =~ /\w/ ); my $xmlfile = shift; my $xml = XML::LibXML->new; my $doc; if ( ! $opt{r} ) { $doc = $xml->parse_file( $xmlfile ); } else { my $xmlstr = "<EXP_ROOT_$$>"; $opt{p} = "/EXP_ROOT_$$" . $opt{p}; { local $/; open( X, '<:utf8', $xmlfile ) or die "Unable to read $xmlfile: + $!\n"; $xmlstr .= <X>; close X; } $xmlstr .= "</EXP_ROOT_$$>"; $doc = $xml->parse_string( $xmlstr ); } my $pth = XML::LibXML::XPathContext->new( $doc ); for my $n ( $pth->findnodes( $opt{p} )) { if ( $opt{x} ) { print $n->toString, "\n"; } else { print $n->textContent, "\n"; } } =head1 NAME exp -- extract XPath matches from XML data =head1 SYNOPSIS exp [-r] [-x] -p xpath_spec file.xml -r : supply a root node for the xml stream -x : output the matching content as xml elements =head1 DESCRIPTION This program will print portions (if any) from an XML file that match a given XPath specifier. =head1 AUTHOR David Graff <graff@ldc.upenn.edu> =cut
|
|---|