HHCHANG has asked for the wisdom of the Perl Monks concerning the following question:
The xms file is from pubmed: http://www.ncbi.nlm.nih.gov/pubmed/?term=1766380&report=xml&format=text
I could read it into a hash which will count each element.
This is my Perl script:#!/usr/bin/perl use strict; use warnings; # use module use XML::Simple; use Data::Dumper; our %pubmed_data; my $xml = new XML::Simple (KeyAttr=>[]); my $data = $xml->XMLin("data1.txt"); traverse( $data ); sub traverse { our %pubmed_data; my ($element) = @_; if( ref( $element ) =~ /HASH/ ) { foreach my $key (keys %$element) { traverse( $$element{$key} ); } } elsif( ref( $element) =~ /ARRAY/ ) { traverse( $_ ) foreach @$element; } else { if (exists $pubmed_data{$element} ) { $pubmed_data{$element}++; } else { $pubmed_data{$element} = 1; } } }
However, there are many additional attribiutes in xml which I don't want it. For example,
<AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Miller</LastName> <ForeName>S I</ForeName> <Initials>SI</Initials> </Author> </AuthorList>
I just want the elements: Miller, S I, SI. But I don't need
CompleteYN="Y", ValidYN="Y".
Any help would be great, Thanks in advance!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: parse xml from pubmed without attribute
by Marshall (Canon) on Sep 21, 2013 at 06:26 UTC | |
|
Re: parse xml from pubmed without attribute
by Anonymous Monk on Sep 21, 2013 at 05:07 UTC |