I'm updating this post to show you working code for that piece o xml.
On the XML I had to add the nodes <NCI_PID_XML> and <Ontology>
<NCI_PID_XML>
<Ontology>
<LabelType name="function" id="8">
<LabelValueList>
<LabelValue name="acyltransferase activity" id="8007" parent_i
+dref="8000" GO="GO:0008415" />
<LabelValue name="calcium- and calmodulin-responsive adenylate
+ cyclase activity" id="8008" parent_idref="8000" GO="GO:0008294" /
>
<LabelValue name="casein kinase I activity" id="11900" parent_
+idref="8000" GO="GO:0004681" />
<LabelValue name="casein kinase activity" id="9634" parent_idr
+ef="8000" GO="GO:0004680" />
<LabelValue name="function" id="75" parent_idref="75" />
<LabelValue name="guanylate cyclase activity" id="8009" parent
+_idref="8000" GO="GO:0004383" />
<LabelValue name="interleukin-12 receptor activity" id="8015"
+parent_idref="8000" GO="GO:0016517" />
<LabelValue name="metalloendopeptidase activity" id="8010" par
+ent_idref="8000" GO="GO:0004222" />
<LabelValue name="molecular_function" id="8000" parent_idref="
+75" GO="GO:0003674" />
<LabelValue name="potassium channel inhibitor activity" id="10
+264" parent_idref="8000" GO="GO:0019870" />
<LabelValue name="protein serine/threonine phosphatase activit
+y" id="8002" parent_idref="8000" GO="GO:0004722" />
<LabelValue name="protein tyrosine phosphatase activity" id="8
+013" parent_idref="8000" GO="GO:0004725" />
<LabelValue name="retinol isomerase activity" id="8012" parent
+_idref="8000" GO="GO:0050251" />
<LabelValue name="serine protease" id="8003" parent_idref="800
+0" />
<LabelValue name="specific transcriptional repressor activity"
+ id="8019" parent_idref="8000" GO="GO:0016566" />
<LabelValue name="telomeric DNA binding" id="8005" parent_idre
+f="8000" GO="GO:0042162" />
<LabelValue name="transcription factor activity" id="8018" par
+ent_idref="8000" GO="GO:0003700" />
<LabelValue name="transcription repressor activity" id="8017"
+parent_idref="8000" GO="GO:0016564" />
<LabelValue name="tumor necrosis factor receptor activity" id=
+"8004" parent_idref="8000" GO="GO:0005031" />
</LabelValueList>
</LabelType>
</Ontology>
</NCI_PID_XML>
Also changed your code to cycle through the LabelValue nodes and call getAttribute on them.
Also closed the file at the end and corrected typos for $id and $name.
#!/usr/bin/perl
use strict;
use XML::XPath;
my $file = "nci.xml";
my $xp = XML::XPath-> new(filename => $file);
open(info,"+>nci.txt");
foreach my $concept ($xp->findnodes('/NCI_PID_XML/Ontology/LabelType')
+) {
my $parentid = $concept->getAttribute('id');
my $type = $concept->getAttribute('name');
foreach my $LabelValue ( $concept->findnodes('LabelValueList/Label
+Value')) {
my $id = $LabelValue->getAttribute('id');
my $name = $LabelValue->getAttribute('name');
my $goid = $LabelValue->getAttribute('GO');
print info "$parentid\t";
print info "$type\t";
print info "$id\t";
print info "$name\t";
print info "$goid\n";
}
}
close info;
And on the nci.txt file I now have
8 function 8007 acyltransferase activity GO:000
+8415
8 function 8008 calcium- and calmodulin-responsive ade
+nylate cyclase activity GO:0008294
8 function 11900 casein kinase I activity GO:000
+4681
8 function 9634 casein kinase activity GO:0004680
8 function 75 function
8 function 8009 guanylate cyclase activity GO:000
+4383
8 function 8015 interleukin-12 receptor activity
+ GO:0016517
8 function 8010 metalloendopeptidase activity GO:000
+4222
8 function 8000 molecular_function GO:0003674
8 function 10264 potassium channel inhibitor activity
+ GO:0019870
8 function 8002 protein serine/threonine phosphatase a
+ctivity GO:0004722
8 function 8013 protein tyrosine phosphatase activity
+ GO:0004725
8 function 8012 retinol isomerase activity GO:005
+0251
8 function 8003 serine protease
8 function 8019 specific transcriptional repressor act
+ivity GO:0016566
8 function 8005 telomeric DNA binding GO:0042162
8 function 8018 transcription factor activity GO:000
+3700
8 function 8017 transcription repressor activity
+ GO:0016564
8 function 8004 tumor necrosis factor receptor activit
+y GO:0005031
|