Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all monks...... I have a xml file which i fetched from ncbi database and contains list of gene ids and their properties...... file is between tags
<eSummaryResult> <DocSum> <Id>25</Id> <Item Name="Description" Type="String">Ableson Murine</Item> ...... </DocSum> <DocSum> same as above </DocSum> </esummaryResult>
I am using XML:: Simple but I want to know if there is any other module which I can use...because I want to write a text file with each column containing the values.... thanks in advance.....

Replies are listed 'Best First'.
Re: XML need suggestions
by graff (Chancellor) on Jun 13, 2007 at 03:13 UTC
    If you are using XML::Simple, show us how you are using it. It can most likely do what you want (and there are most likely other ways to do what you want, using some other XML module), but you haven't given us enough information... e.g., what are the "values" that you want to write as "columns" in your output? What should the output look like? What code have you tried?
Re: XML need suggestions
by andreas1234567 (Vicar) on Jun 13, 2007 at 05:41 UTC
Re: XML need suggestions
by Jenda (Abbot) on Jun 13, 2007 at 10:05 UTC

    You could use eg. XML::Rules, somewhat like this:

    use XML::Rules; my $parser = XML::Rules->new( rules => [ Item => sub { $_[1]->{Name} => $_[1]->{_content}}, # take only the name and content from the <Item> # and make it available using the Name as the key Id => 'content', DocSum => sub { print "Id: $_[1]->{Id}\nName: $_[1]->{Name}\nFoo: $_[1]->{Foo1}, + $_[1]->{Foo2}\n\n"; return; # we are done with the <DocSum>, no need to keep the dat +a }, ] ); $parser->parse($the_file);

    Basically the Item and Id rules specify what and how do you want from those tags and the DocSum consolidates the data from the subgtags and processes it the way you need. The (posibly) important difference from XML::Simple is that you process the XML gene by gene, instead of parsing the whole file, creating a possibly huge datastructure and then processing that structure.

    You may want to have a look at XML::Twig and CPAN::XML::LibXML as well.

    Update 2007-6-14: fixed a typo in the code.

Re: XML need suggestions
by Wonko the sane (Curate) on Jun 13, 2007 at 13:55 UTC
    I find the best way to manipulate XML is with XSLT.
    I took a stab at what I thought you were asking for in output, and added some extra records to print out.
    !/usr/local/bin/perl -w use strict; use warnings; use XML::LibXML; use XML::LibXSLT; my $xml = q{<eSummaryResult> <DocSum> <Id>25</Id> <Item Name="Description" Type="String">Ableson Murine</Item> </DocSum> <DocSum> <Id>26</Id> <Item Name="Description" Type="String">yada yada</Item> </DocSum> <DocSum> <Id>27</Id> <Item Name="Description" Type="String">yada yada something else</It +em> </DocSum> </eSummaryResult> }; my $xslt_stylesheet = q{<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="/eSummaryResult/DocSum"><xsl:apply-templates se +lect="Id|Item"/></xsl:template> <xsl:template match="Id"><xsl:value-of select="."/>, </xsl:template> <xsl:template match="Item[@Name='Description']"><xsl:value-of select +="."/></xsl:template> </xsl:stylesheet> }; my $parser = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); my $style_doc = $parser->parse_string( $xslt_stylesheet ); my $source = $parser->parse_string( $xml ); my $stylesheet = $xslt->parse_stylesheet( $style_doc ); my $results = $stylesheet->transform( $source ); my $output = $stylesheet->output_string( $results ); print $output;
    Output:
    :!./t2.pl 25, Ableson Murine 26, yada yada 27, yada yada something else
    Hope that helps.
    Best Regards,
    WOnko

      Let me see. what would the code be for the very same thing using XML::Rules:

      use XML::Rules; my $xml = q{<eSummaryResult> <DocSum> <Id>25</Id> <Item Name="Description" Type="String">Ableson Murine</Item> </DocSum> <DocSum> <Id>26</Id> <Item Name="Description" Type="String">yada yada</Item> </DocSum> <DocSum> <Id>27</Id> <Item Name="Description" Type="String">yada yada something else</It +em> </DocSum> </eSummaryResult> }; my $parser = XML::Rules->new( rules => [ Item => sub { $_[1]->{Name} => $_[1]->{_content}}, Id => 'content', DocSum => sub { print "$_[1]->{Id}, $_[1]->{Description}\n"; return; }, ] ); $parser->parse($xml);
      There's less code to get the job done without XSLT then there is to apply it. And I'm sure you'd get the same result using most other modules.

      I find XSLT incredibly clumsy, blithering and unreadable.