DTD and xml module

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: DTD and xml module
by ww (Archbishop) on Jun 14, 2007 at 22:01 UTC

What have you tried? (i.e. show us some code.)
What was the output that you describe, somewhat imprecisely, as "strange?"

by Anonymous Monk on Jun 14, 2007 at 22:12 UTC

#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $file = 'Summary';
my $t = XML::Twig->new(twig_handlers => 
                { docsum => \&docsum, 
                  para => sub {$_->set_gi('Item')}
            }
            );
$ty->parsefile($file);
$ty->flush,"\n";

sub docsum{
    my ($ty,$docsum) = @_;
    $docsum->set_gi('docsum');
    my $title = $docsum->first_child('Item');
    my $and = $title-{'att'}->{'Name'};
    $docsum->flush;
}
[download]

<?xml version="1.0"?>
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 Octobe
+r 2004//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_04
+1029.dtd">
<eSummaryResult>
<DocSum>
    <Id>25</Id>
    <Item Name="Name" Type="String">ABL1</Item>
    <Item Name="Description" Type="String">v-abl Abelson murine leukem
+ia viral oncogene homolog 1</Item>
    <Item Name="Orgname" Type="String">Homo sapiens</Item>
    <Item Name="Status" Type="Integer">0</Item>
    <Item Name="CurrentID" Type="Integer">0</Item>
    <Item Name="Chromosome" Type="String">9</Item>
    <Item Name="GeneticSource" Type="String">genomic</Item>
    <Item Name="MapLocation" Type="String">9q34.1</Item>
    <Item Name="OtherAliases" Type="String">ABL, JTK7, bcr/abl, c-ABL,
+ p150, v-abl</Item>
    <Item Name="OtherDesignations" Type="String">Abelson murine leukem
+ia viral (v-abl) oncogene homolog 1|BCR/ABL (major breakpoint) fusion
+ peptide|bcr/c-abl oncogene protein|proto-oncogene tyrosine-protein k
+inase ABL1</Item>
    <Item Name="NomenclatureSymbol" Type="String">ABL1</Item>
    <Item Name="NomenclatureName" Type="String">v-abl Abelson murine l
+eukemia viral oncogene homolog 1</Item>
    <Item Name="NomenclatureStatus" Type="String">Official</Item>
    <Item Name="TaxID" Type="Integer">9606</Item>
    <Item Name="Mim" Type="List">
        <Item Name="int" Type="Integer">189980</Item>
    </Item>
    <Item Name="GenomicInfo" Type="List">
        <Item Name="GenomicInfoType" Type="Structure">
            <Item Name="ChrLoc" Type="String">9</Item>
            <Item Name="ChrAccVer" Type="String">NC_000009.10</Item>
            <Item Name="ChrStart" Type="Integer">132579088</Item>
            <Item Name="ChrStop" Type="Integer">132752882</Item>
        </Item>
    </Item>
</DocSum>

<DocSum>
    <Id>27</Id>
    <Item Name="Name" Type="String">ABL2</Item>
    <Item Name="Description" Type="String">v-abl Abelson murine leukem
+ia viral oncogene homolog 2 (arg, Abelson-related gene)</Item>
    <Item Name="Orgname" Type="String">Homo sapiens</Item>
    <Item Name="Status" Type="Integer">0</Item>
    <Item Name="CurrentID" Type="Integer">0</Item>
    <Item Name="Chromosome" Type="String">1</Item>
    <Item Name="GeneticSource" Type="String">genomic</Item>
    <Item Name="MapLocation" Type="String">1q24-q25</Item>
    <Item Name="OtherAliases" Type="String">RP11-177A2.3, ABLL, ARG</I
+tem>
    <Item Name="OtherDesignations" Type="String">Abelson murine leukem
+ia viral (v-abl) oncogene homolog 2|Abelson-related|v-abl Abelson mur
+ine leukemia viral oncogene homolog 2</Item>
    <Item Name="NomenclatureSymbol" Type="String">ABL2</Item>
    <Item Name="NomenclatureName" Type="String">v-abl Abelson murine l
+eukemia viral oncogene homolog 2 (arg, Abelson-related gene)</Item>
    <Item Name="NomenclatureStatus" Type="String">Official</Item>
    <Item Name="TaxID" Type="Integer">9606</Item>
    <Item Name="Mim" Type="List">
        <Item Name="int" Type="Integer">164690</Item>
    </Item>
    <Item Name="GenomicInfo" Type="List">
        <Item Name="GenomicInfoType" Type="Structure">
            <Item Name="ChrLoc" Type="String">1</Item>
            <Item Name="ChrAccVer" Type="String">NC_000001.9</Item>
            <Item Name="ChrStart" Type="Integer">177465358</Item>
            <Item Name="ChrStop" Type="Integer">177343379</Item>
        </Item>
    </Item>
</DocSum>

<DocSum>
    <Id>90</Id>
    <Item Name="Name" Type="String">ACVR1</Item>
    <Item Name="Description" Type="String">activin A receptor, type I<
+/Item>
    <Item Name="Orgname" Type="String">Homo sapiens</Item>
    <Item Name="Status" Type="Integer">0</Item>
    <Item Name="CurrentID" Type="Integer">0</Item>
    <Item Name="Chromosome" Type="String">2</Item>
    <Item Name="GeneticSource" Type="String">genomic</Item>
    <Item Name="MapLocation" Type="String">2q23-q24</Item>
    <Item Name="OtherAliases" Type="String">ACTRI, ACVRLK2, ALK2, FOP,
+ SKR1</Item>
    <Item Name="OtherDesignations" Type="String">activin A receptor, t
+ype II-like kinase 2|activin A type I receptor|hydroxyalkyl-protein k
+inase</Item>
    <Item Name="NomenclatureSymbol" Type="String">ACVR1</Item>
    <Item Name="NomenclatureName" Type="String">activin A receptor, ty
+pe I</Item>
    <Item Name="NomenclatureStatus" Type="String">Official</Item>
    <Item Name="TaxID" Type="Integer">9606</Item>
    <Item Name="Mim" Type="List">
        <Item Name="int" Type="Integer">102576</Item>
    </Item>
    <Item Name="GenomicInfo" Type="List">
        <Item Name="GenomicInfoType" Type="Structure">
            <Item Name="ChrLoc" Type="String">2</Item>
            <Item Name="ChrAccVer" Type="String">NC_000002.10</Item>
            <Item Name="ChrStart" Type="Integer">158403035</Item>
            <Item Name="ChrStop" Type="Integer">158301206</Item>
        </Item>
    </Item>
</DocSum>

<eSummaryResult>
[download]

[reply]
[d/l]
[select]

Re^3: DTD and xml module

by mirod (Canon) on Jun 15, 2007 at 08:32 UTC

I still don't understand quite what it is you want to get. An example of the expected output would definitely help.

Some comments though, maybe they will put you on the right track:

if you don't want to print the document, then don't use flush, because that's what it does. Now if what you want is to free memory, then purge is probably what you are looking for,
you have 2 handler: on on docsum and one on para. Sadly there are no docsum (there is a DocSum but XML is case-sensitive), nor para elements in the document, so these handlers are never called.

[reply]

Re: DTD and xml module
by Jenda (Abbot) on Jun 15, 2007 at 12:31 UTC

"parse the xml document" is not very informative. Until we know what do you want to do with the XML there's no telling which module will best fit the task. It's like asking what tool is best to work with wood. If we do not know what do you plan to do with the wood, we can't suggest anything.

Jenda
Support Denmark!
Defend the free world!

[reply]

Re^2: DTD and xml module

by Anonymous Monk on Jun 15, 2007 at 16:50 UTC

Hi Jenda, xml document is shown in the previous replies, what i am trying to do is for each item between <DocSum> tags, i am trying to write a text file with each enrty separated by tab, like for example id\t\item name\tItem Description.....and so on..........

[reply]