Hi monks-of-perliness, I'm having a problem getting the data out of an xml file with XML::simple.

The xml file contains information in the form:

<project by="company" name="personname"> <pattern name="company-000001" owner="company" description="Microarr +ay" species_database="d.base"> <reporter name="A_24_P344666" systematic_name="NM_020341"> <feature number="1780"> <position x="0.733234841870825" y="10.033" units="mm" /> </feature> <gene systematic_name="NM_020341" primary_name="PAK7" descriptio +n="Homo sapiens p21(CDKN1A)-activated kinase 7 (PAK7), transcript var +iant 1, mRNA [NM_020341]"> <accession database="ref" id="NM_020341" /> <accession database="ref" id="NM_177990" /> <accession database="ens" id="ENST00000378429" /> <accession database="ens" id="ENST00000378423" /> <other name="accessions" value="ref|NM_020341|ref|NM_177990|en +s|ENST00000378429|ens|ENST00000378423" /> <other name="chr_coord" value="chr20:9466136-9466077" /> </gene> </reporter> <reporter ... </reporter> <reporter ... </reporter> </pattern> </project>


I want to pick out the different names (instances of name, systematic_name & primary_name) put then into an array and then store this array as a value in a hash with the key being the reporter attribute: name.

If I read this into perl with XML::simple and don't specify KeyAttr, then when I try:  my @probekeys = keys %{$data->{pattern}->{reporter}};

then it tried to use name as the hash keys and reads these into an array (which is what I want) but it falls over because name turns out not to be unique.

If I read this into perl with XML::simple and specify a different attribute as the key attribute using KeyAttr, then when I try:  my @probekeys = keys %{$data->{pattern}->{reporter}};

Then XML::simple insists on forcing things into arrays (even if I use ForceArray => 0) and I keep getting told that pseudo-hashes have been depreciated.... Does anyone have any smart ideas about how I can get hold of this information without just reading the 500k line file in one line at a time?

In reply to XML::Simple and pseudo hashes... by nickschurch

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.