Hi all,

I am trying to parse an xml file and am encountering errors, I think its to do with using the DTD. Here is part of the xml file:

<?xml version="1.0"?> <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 Octobe +r 2004//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_04 +1029.dtd"> <eSummaryResult> <DocSum> <Id>7597478</Id> <Item Name="Caption" Type="String">NC_002192</Item> <Item Name="Title" Type="String">Lactococcus lactis plasmid pW +V01, complete sequence</Item> <Item Name="Extra" Type="String">gi|7597478|ref|NC_002192.1||g +nl|NCBI_GENOMES|15284[7597478]</Item> <Item Name="Gi" Type="Integer">7597478</Item> <Item Name="CreateDate" Type="String">1991/04/15</Item> <Item Name="UpdateDate" Type="String">2008/04/09</Item> <Item Name="Flags" Type="Integer">520</Item> <Item Name="TaxId" Type="Integer">1358</Item> <Item Name="Length" Type="Integer">2178</Item> <Item Name="Status" Type="String">live</Item> <Item Name="ReplacedBy" Type="String"></Item> <Item Name="Comment" Type="String"><![CDATA[ ]]></Item> </DocSum> <DocSum> <Id>7597489</Id> <Item Name="Caption" Type="String">NC_002193</Item> <Item Name="Title" Type="String">Lactococcus lactis cremoris C +remoris Wg2 plasmid pWVO2, complete sequence</Item> <Item Name="Extra" Type="String">gi|7597489|ref|NC_002193.1||g +nl|NCBI_GENOMES|15285[7597489]</Item> <Item Name="Gi" Type="Integer">7597489</Item> <Item Name="CreateDate" Type="String">1993/05/10</Item> <Item Name="UpdateDate" Type="String">2008/07/17</Item> <Item Name="Flags" Type="Integer">776</Item> <Item Name="TaxId" Type="Integer">1359</Item> <Item Name="Length" Type="Integer">3826</Item> <Item Name="Status" Type="String">live</Item> <Item Name="ReplacedBy" Type="String"></Item> <Item Name="Comment" Type="String"><![CDATA[ ]]></Item> </DocSum>

This is my code

#!/usr/bin/perl use strict; use warnings; use XML::LibXML; my $public_id = "-//NLM//DTD eSummaryResult, 29 October 2004//EN"; my $system_id = "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSumma +ry_041029.dtd"; my $dtd = XML::LibXML::Dtd->new($public_id, $system_id); my $filename='/g/Washu_PopGen/test_gi_docsumms_delVer4.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($filename); my $outfile ='/g/Washu_PopGen/test_gi_taxid_table.txt'; $doc ->validate($dtd); open(OUTFILE,">",$outfile); print OUTFILE join("t", qw(Id TaxId Length Status ReplacedBy))."\n"; foreach my $DocSum ($doc->findnodes('/eSummaryResult/DocSum')) { my($Id) = $DocSum->findnodes('./Id'); print OUTFILE $Id->to_literal, "\t"; my($TaxId) = $DocSum->findnodes('./TaxId'); print OUTFILE $TaxId->to_literal, "\t"; my($Length) = $DocSum->findnodes('./Length'); print OUTFILE $Length->to_literal, "\t"; my($Status) = $DocSum->findnodes('./Status'); print OUTFILE $Status->to_literal, "\t"; my($ReplacedBy) = $DocSum->findnodes('./ReplacedBy'); print OUTFILE $ReplacedBy->to_literal, "\n"; }

This is part of my Error message

No declaration for element eSummaryResult + + No declaration for element DocSum + + No declaration for element Id + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item + + No declaration for element Item + + No declaration for attribute Name of element Item + + No declaration for attribute Type of element Item

This is the dtd file

<!-- This is the Current DTD for Entrez eSummary version 2 $Id: eSummary_041029.dtd 49514 2004-10-29 15:52:04Z parantha $ --> <!-- ================================================================= + --> <!ELEMENT Id (#PCDATA)> <!-- \d+ --> <!ELEMENT Item (#PCDATA|Item)*> <!-- .+ --> <!ATTLIST Item Name CDATA #REQUIRED Type (Integer|Date|String|Structure|List|Flags|Qualifier|Enumerato +r|Unknown) #REQUIRED > <!ELEMENT ERROR (#PCDATA)> <!-- .+ --> <!ELEMENT DocSum (Id, Item+)> <!ELEMENT eSummaryResult (DocSum|ERROR)+>

Thanks ! Any help is appreciated


In reply to LibXML and parsing file with DTD by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.