Ever since I got acquainted with XPath syntax (finally! Why did I wait so long??), and the really excellent GNU LibXML package (which has a thorough and well-documented Perl wrapper XML::LibXML), I'm having a lot more fun with pulling stuff out of XML streams.

Below is a little perl script that uses XML::LibXML and it's XPath abilities to provide a generic command-line method for extracting any specific content from an XML file, so long as you can provide the XPath syntax for the content you want. Given that script, the particular task stated in the OP can be accomplished with this command line (assuming the XML data has the required closing tag, as mentioned in a previous reply, and is stored in a file called "test.xml"):

exp -p "//info_name | //it_size" test.xml # output: FZGA34177.b1 35000 FZGA34178.b1 12000 FZGA34179.b1 7000 FZGA34180.b1 3000 FZGA34181.b1 7000
There's a pretty good reference for XPath usage here: http://www.w3schools.com/XPath/default.asp. The code for my "exp" utility is pretty simple:
#!/usr/bin/perl use strict; use XML::LibXML; use Getopt::Long; binmode STDOUT,":utf8"; my $Usage = "Usage: $0 [-x] [-r] -p xpath_spec file.xml\n"; my %opt; die $Usage unless ( GetOptions( \%opt, 'x', 'r', 'p=s' ) and @ARGV == 1 and -f $ARGV[0] and $opt{p} =~ /\w/ ); my $xmlfile = shift; my $xml = XML::LibXML->new; my $doc; if ( ! $opt{r} ) { $doc = $xml->parse_file( $xmlfile ); } else { my $xmlstr = "<EXP_ROOT_$$>"; $opt{p} = "/EXP_ROOT_$$" . $opt{p}; { local $/; open( X, '<:utf8', $xmlfile ) or die "Unable to read $xmlfile: + $!\n"; $xmlstr .= <X>; close X; } $xmlstr .= "</EXP_ROOT_$$>"; $doc = $xml->parse_string( $xmlstr ); } my $pth = XML::LibXML::XPathContext->new( $doc ); for my $n ( $pth->findnodes( $opt{p} )) { if ( $opt{x} ) { print $n->toString, "\n"; } else { print $n->textContent, "\n"; } } =head1 NAME exp -- extract XPath matches from XML data =head1 SYNOPSIS exp [-r] [-x] -p xpath_spec file.xml -r : supply a root node for the xml stream -x : output the matching content as xml elements =head1 DESCRIPTION This program will print portions (if any) from an XML file that match a given XPath specifier. =head1 AUTHOR David Graff <graff@ldc.upenn.edu> =cut

In reply to Re: Parse XML and compare with Fasta in Perl by graff
in thread Parse XML and compare with Fasta in Perl by ad23

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.