AWallBuilder has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to parse an xml file and am encountering errors, I think its to do with using the DTD. Here is part of the xml file:
<?xml version="1.0"?> <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 Octobe +r 2004//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_04 +1029.dtd"> <eSummaryResult> <DocSum> <Id>7597478</Id> <Item Name="Caption" Type="String">NC_002192</Item> <Item Name="Title" Type="String">Lactococcus lactis plasmid pW +V01, complete sequence</Item> <Item Name="Extra" Type="String">gi|7597478|ref|NC_002192.1||g +nl|NCBI_GENOMES|15284[7597478]</Item> <Item Name="Gi" Type="Integer">7597478</Item> <Item Name="CreateDate" Type="String">1991/04/15</Item> <Item Name="UpdateDate" Type="String">2008/04/09</Item> <Item Name="Flags" Type="Integer">520</Item> <Item Name="TaxId" Type="Integer">1358</Item> <Item Name="Length" Type="Integer">2178</Item> <Item Name="Status" Type="String">live</Item> <Item Name="ReplacedBy" Type="String"></Item> <Item Name="Comment" Type="String"><![CDATA[ ]]></Item> </DocSum> <DocSum> <Id>7597489</Id> <Item Name="Caption" Type="String">NC_002193</Item> <Item Name="Title" Type="String">Lactococcus lactis cremoris C +remoris Wg2 plasmid pWVO2, complete sequence</Item> <Item Name="Extra" Type="String">gi|7597489|ref|NC_002193.1||g +nl|NCBI_GENOMES|15285[7597489]</Item> <Item Name="Gi" Type="Integer">7597489</Item> <Item Name="CreateDate" Type="String">1993/05/10</Item> <Item Name="UpdateDate" Type="String">2008/07/17</Item> <Item Name="Flags" Type="Integer">776</Item> <Item Name="TaxId" Type="Integer">1359</Item> <Item Name="Length" Type="Integer">3826</Item> <Item Name="Status" Type="String">live</Item> <Item Name="ReplacedBy" Type="String"></Item> <Item Name="Comment" Type="String"><![CDATA[ ]]></Item> </DocSum>
This is my code
#!/usr/bin/perl use strict; use warnings; use XML::LibXML; my $public_id = "-//NLM//DTD eSummaryResult, 29 October 2004//EN"; my $system_id = "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSumma +ry_041029.dtd"; my $dtd = XML::LibXML::Dtd->new($public_id, $system_id); my $filename='/g/Washu_PopGen/test_gi_docsumms_delVer4.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($filename); my $outfile ='/g/Washu_PopGen/test_gi_taxid_table.txt'; $doc ->validate($dtd); open(OUTFILE,">",$outfile); print OUTFILE join("t", qw(Id TaxId Length Status ReplacedBy))."\n"; foreach my $DocSum ($doc->findnodes('/eSummaryResult/DocSum')) { my($Id) = $DocSum->findnodes('./Id'); print OUTFILE $Id->to_literal, "\t"; my($TaxId) = $DocSum->findnodes('./TaxId'); print OUTFILE $TaxId->to_literal, "\t"; my($Length) = $DocSum->findnodes('./Length'); print OUTFILE $Length->to_literal, "\t"; my($Status) = $DocSum->findnodes('./Status'); print OUTFILE $Status->to_literal, "\t"; my($ReplacedBy) = $DocSum->findnodes('./ReplacedBy'); print OUTFILE $ReplacedBy->to_literal, "\n"; }
This is part of my Error message
No declaration for element eSummaryResult + + No declaration for element DocSum + + No declaration for element Id + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item
This is the dtd file
<!-- This is the Current DTD for Entrez eSummary version 2 $Id: eSummary_041029.dtd 49514 2004-10-29 15:52:04Z parantha $ --> <!-- ================================================================= + --> <!ELEMENT Id (#PCDATA)> <!-- \d+ --> <!ELEMENT Item (#PCDATA|Item)*> <!-- .+ --> <!ATTLIST Item Name CDATA #REQUIRED Type (Integer|Date|String|Structure|List|Flags|Qualifier|Enumerato +r|Unknown) #REQUIRED > <!ELEMENT ERROR (#PCDATA)> <!-- .+ --> <!ELEMENT DocSum (Id, Item+)> <!ELEMENT eSummaryResult (DocSum|ERROR)+>
Thanks ! Any help is appreciated
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: LibXML and parsing file with DTD
by ikegami (Patriarch) on Jul 29, 2010 at 16:49 UTC | |
|
Re: LibXML and parsing file with DTD
by derby (Abbot) on Jul 29, 2010 at 12:05 UTC | |
by AWallBuilder (Beadle) on Jul 29, 2010 at 12:39 UTC | |
by derby (Abbot) on Jul 29, 2010 at 12:55 UTC | |
by AWallBuilder (Beadle) on Jul 29, 2010 at 13:12 UTC | |
by derby (Abbot) on Jul 29, 2010 at 13:23 UTC |