MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:

Hello, merry xmas (in a politically incorrect way to all you other religion types)

Im trying still !!! to get an XML document parsed. I have used XML::Parser, and left it as default SAX as i dont want ot edit the document, only read it. My code is very simple, but i get some random results i was not expecting.

#!/usr/bin/perl -w use strict; use warnings; use XML::Parser; my $xml; my $results; $xml = new XML::Parser(Style => 'Debug'); $xml->parsefile('info.xml'); $xml->parse('<attribute Name="Column_Type" >F635 Median</attribute>');

this gives the following output:

\\ () entries || #10; entries || #9;#9; entries \\ (ID 10 Name Array_1_Measurement Column Metadata) entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name Column_Type) entries enumeration attribute || F635 Median entries enumeration // entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name Data_Type) entries enumeration attribute || INTEGER entries enumeration // entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name Origin) entries enumeration attribute || Feature entries enumeration // entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name Quantitation_Type) entries enumeration attribute || MeasuredSignal entries enumeration // entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name Scale) entries enumeration attribute || LINEAR entries enumeration // entries enumeration || #10; entries enumeration || #9;#9;#9; entries enumeration \\ (Name LabelledExtract) entries enumeration attribute || - entries enumeration // entries //

I was just expecting to get back either F635 Median or all the info relating to that node.

The xml doc contains:

<?xml version="1.0" encoding="UTF-8"?> <entries> <enumeration ID="10" Name="Array_1_Measurement Column Metadata +" > <attribute Name="Column_Type" >F635 Median</attribute> <attribute Name="Data_Type" >INTEGER</attribute> <attribute Name="Origin" >Feature</attribute> <attribute Name="Quantitation_Type" >MeasuredSignal</attri +bute> <attribute Name="Scale" >LINEAR</attribute> <attribute Name="LabelledExtract" >-</attribute></enumerat +ion> <enumeration ID="1" Name="Array_1_Measurement Data" > <attribute Name="Gene" >AAC1</attribute> <attribute Name="F635 Median" >325</attribute> <attribute Name="B635 Median" >103</attribute></enumeratio +n> <enumeration ID="2" Name="Array_1_Measurement Data" > <attribute Name="Gene" >AAC3</attribute> <attribute Name="F635 Median" >389</attribute> <attribute Name="B635 Median" >115</attribute></enumeratio +n> </entries>

Can any body tell me if this is whats supposed to happen or am i not defining the nodes in the perl script properly.

Any help would be great.
MonkPaul.

Replies are listed 'Best First'.
Re: XML Parsing
by thedoe (Monk) on Dec 14, 2005 at 18:43 UTC

    Have you tried using the XML::Simple module? As its name implies, this module is very simple to use for basic XML reading and writing (in case you decide to write in the future).

    In the documentation for XML::Parser the parse method mentions that the first argument should be the source of the data, so perhaps this is what is causing you difficulty in your efforts.

    I would recommend going with XML::Simple though, it has helped me quickly parse through many XML documents.

Re: XML Parsing
by GrandFather (Saint) on Dec 14, 2005 at 21:40 UTC

    You should read the documentation for XML::Parse which tells you that that is what you will get.

    To solve your actual problem try using XML::TreeBuilder. Here's some code to get you started:

    use strict; use warnings; use XML::TreeBuilder; my $xml; my $results; $xml = XML::TreeBuilder->new; $xml->parse(do{local $/; <DATA>}); my @elements = $xml->look_down('Name', 'Column_Type'); for (@elements) { next if $_->as_text () ne 'F635 Median'; print $_->parent ()->as_text(); } __DATA__ <?xml version="1.0" encoding="UTF-8"?> <entries> <enumeration ID="10" Name="Array_1_Measurement Column Metadata +" > <attribute Name="Column_Type" >F635 Median</attribute> <attribute Name="Data_Type" >INTEGER</attribute> <attribute Name="Origin" >Feature</attribute> <attribute Name="Quantitation_Type" >MeasuredSignal</attri +bute> <attribute Name="Scale" >LINEAR</attribute> <attribute Name="LabelledExtract" >-</attribute></enumerat +ion> <enumeration ID="1" Name="Array_1_Measurement Data" > <attribute Name="Gene" >AAC1</attribute> <attribute Name="F635 Median" >325</attribute> <attribute Name="B635 Median" >103</attribute></enumeratio +n> <enumeration ID="2" Name="Array_1_Measurement Data" > <attribute Name="Gene" >AAC3</attribute> <attribute Name="F635 Median" >389</attribute> <attribute Name="B635 Median" >115</attribute></enumeratio +n> </entries>

    Prints:

    F635 Median INTEGER Feature MeasuredSignal LINEAR -

    DWIM is Perl's answer to Gödel
      Thankyou for your reply,
      That is just what i was after. I was looking to get the values for genes based on some experimental data.

      Seems like a perfect place to start now.
      MonkPaul.