HHCHANG has asked for the wisdom of the Perl Monks concerning the following question:

I want to parse xml from pubmed.

The xms file is from pubmed: http://www.ncbi.nlm.nih.gov/pubmed/?term=1766380&report=xml&format=text

I could read it into a hash which will count each element.

This is my Perl script:
#!/usr/bin/perl use strict; use warnings; # use module use XML::Simple; use Data::Dumper; our %pubmed_data; my $xml = new XML::Simple (KeyAttr=>[]); my $data = $xml->XMLin("data1.txt"); traverse( $data ); sub traverse { our %pubmed_data; my ($element) = @_; if( ref( $element ) =~ /HASH/ ) { foreach my $key (keys %$element) { traverse( $$element{$key} ); } } elsif( ref( $element) =~ /ARRAY/ ) { traverse( $_ ) foreach @$element; } else { if (exists $pubmed_data{$element} ) { $pubmed_data{$element}++; } else { $pubmed_data{$element} = 1; } } }

However, there are many additional attribiutes in xml which I don't want it. For example,

<AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Miller</LastName> <ForeName>S I</ForeName> <Initials>SI</Initials> </Author> </AuthorList>

I just want the elements: Miller, S I, SI. But I don't need

CompleteYN="Y", ValidYN="Y".

Any help would be great, Thanks in advance!

Replies are listed 'Best First'.
Re: parse xml from pubmed without attribute
by Marshall (Canon) on Sep 21, 2013 at 06:26 UTC
    "I want to parse xml from pubmed."
    This is not the right idea.

    Pubmed provides a huge amount of software to access their site in C, C++ and Perl.

    toolkits:
    http://www.ncbi.nlm.nih.gov/guide/howto/dwn-software/ Bio-Perl Toolkit:
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC187536/

    There is a whole bunch of this stuff available for
    Perl and C. There is so much stuff available that it
    might take a few days to discover and understand the
    options.

    I recommend that you look at the free-ware that is
    available and see what you can do with it.

    I just gave a few links to get started.
    There are many more.

Re: parse xml from pubmed without attribute
by Anonymous Monk on Sep 21, 2013 at 05:07 UTC