I'm updating this post to show you working code for that piece o xml.
On the XML I had to add the nodes <NCI_PID_XML> and <Ontology>
<NCI_PID_XML> <Ontology> <LabelType name="function" id="8"> <LabelValueList> <LabelValue name="acyltransferase activity" id="8007" parent_i +dref="8000" GO="GO:0008415" /> <LabelValue name="calcium- and calmodulin-responsive adenylate + cyclase activity" id="8008" parent_idref="8000" GO="GO:0008294" / > <LabelValue name="casein kinase I activity" id="11900" parent_ +idref="8000" GO="GO:0004681" /> <LabelValue name="casein kinase activity" id="9634" parent_idr +ef="8000" GO="GO:0004680" /> <LabelValue name="function" id="75" parent_idref="75" /> <LabelValue name="guanylate cyclase activity" id="8009" parent +_idref="8000" GO="GO:0004383" /> <LabelValue name="interleukin-12 receptor activity" id="8015" +parent_idref="8000" GO="GO:0016517" /> <LabelValue name="metalloendopeptidase activity" id="8010" par +ent_idref="8000" GO="GO:0004222" /> <LabelValue name="molecular_function" id="8000" parent_idref=" +75" GO="GO:0003674" /> <LabelValue name="potassium channel inhibitor activity" id="10 +264" parent_idref="8000" GO="GO:0019870" /> <LabelValue name="protein serine/threonine phosphatase activit +y" id="8002" parent_idref="8000" GO="GO:0004722" /> <LabelValue name="protein tyrosine phosphatase activity" id="8 +013" parent_idref="8000" GO="GO:0004725" /> <LabelValue name="retinol isomerase activity" id="8012" parent +_idref="8000" GO="GO:0050251" /> <LabelValue name="serine protease" id="8003" parent_idref="800 +0" /> <LabelValue name="specific transcriptional repressor activity" + id="8019" parent_idref="8000" GO="GO:0016566" /> <LabelValue name="telomeric DNA binding" id="8005" parent_idre +f="8000" GO="GO:0042162" /> <LabelValue name="transcription factor activity" id="8018" par +ent_idref="8000" GO="GO:0003700" /> <LabelValue name="transcription repressor activity" id="8017" +parent_idref="8000" GO="GO:0016564" /> <LabelValue name="tumor necrosis factor receptor activity" id= +"8004" parent_idref="8000" GO="GO:0005031" /> </LabelValueList> </LabelType> </Ontology> </NCI_PID_XML>
Also changed your code to cycle through the LabelValue nodes and call getAttribute on them.
Also closed the file at the end and corrected typos for $id and $name.
#!/usr/bin/perl use strict; use XML::XPath; my $file = "nci.xml"; my $xp = XML::XPath-> new(filename => $file); open(info,"+>nci.txt"); foreach my $concept ($xp->findnodes('/NCI_PID_XML/Ontology/LabelType') +) { my $parentid = $concept->getAttribute('id'); my $type = $concept->getAttribute('name'); foreach my $LabelValue ( $concept->findnodes('LabelValueList/Label +Value')) { my $id = $LabelValue->getAttribute('id'); my $name = $LabelValue->getAttribute('name'); my $goid = $LabelValue->getAttribute('GO'); print info "$parentid\t"; print info "$type\t"; print info "$id\t"; print info "$name\t"; print info "$goid\n"; } } close info;
And on the nci.txt file I now have
8 function 8007 acyltransferase activity GO:000 +8415 8 function 8008 calcium- and calmodulin-responsive ade +nylate cyclase activity GO:0008294 8 function 11900 casein kinase I activity GO:000 +4681 8 function 9634 casein kinase activity GO:0004680 8 function 75 function 8 function 8009 guanylate cyclase activity GO:000 +4383 8 function 8015 interleukin-12 receptor activity + GO:0016517 8 function 8010 metalloendopeptidase activity GO:000 +4222 8 function 8000 molecular_function GO:0003674 8 function 10264 potassium channel inhibitor activity + GO:0019870 8 function 8002 protein serine/threonine phosphatase a +ctivity GO:0004722 8 function 8013 protein tyrosine phosphatase activity + GO:0004725 8 function 8012 retinol isomerase activity GO:005 +0251 8 function 8003 serine protease 8 function 8019 specific transcriptional repressor act +ivity GO:0016566 8 function 8005 telomeric DNA binding GO:0042162 8 function 8018 transcription factor activity GO:000 +3700 8 function 8017 transcription repressor activity + GO:0016564 8 function 8004 tumor necrosis factor receptor activit +y GO:0005031

In reply to Re^3: problems with Xpath by olus
in thread problems with Xpath by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.