Hi monks-of-perliness, I'm having a problem getting the data out of an xml file with XML::simple.
The xml file contains information in the form:
<project by="company" name="personname">
<pattern name="company-000001" owner="company" description="Microarr
+ay" species_database="d.base">
<reporter name="A_24_P344666" systematic_name="NM_020341">
<feature number="1780">
<position x="0.733234841870825" y="10.033" units="mm" />
</feature>
<gene systematic_name="NM_020341" primary_name="PAK7" descriptio
+n="Homo sapiens p21(CDKN1A)-activated kinase 7 (PAK7), transcript var
+iant 1, mRNA [NM_020341]">
<accession database="ref" id="NM_020341" />
<accession database="ref" id="NM_177990" />
<accession database="ens" id="ENST00000378429" />
<accession database="ens" id="ENST00000378423" />
<other name="accessions" value="ref|NM_020341|ref|NM_177990|en
+s|ENST00000378429|ens|ENST00000378423" />
<other name="chr_coord" value="chr20:9466136-9466077" />
</gene>
</reporter>
<reporter ...
</reporter>
<reporter ...
</reporter>
</pattern>
</project>
I want to pick out the different names (instances of
name,
systematic_name &
primary_name) put then into an array and then store this array as a value in a hash with the key being the
reporter attribute:
name.
If I read this into perl with XML::simple and don't specify KeyAttr, then when I try:
my @probekeys = keys %{$data->{pattern}->{reporter}};
then it tried to use
name as the hash keys and reads these into an array (which is what I want) but it falls over because
name turns out not to be unique.
If I read this into perl with XML::simple and specify a different attribute as the key attribute using KeyAttr, then when I try:
my @probekeys = keys %{$data->{pattern}->{reporter}};
Then XML::simple insists on forcing things into arrays (even if I use ForceArray => 0) and I keep getting told that pseudo-hashes have been depreciated....
Does anyone have any smart ideas about how I can get hold of this information without just reading the 500k line file in one line at a time?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.