Falantar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have an xml file formatted in a not so neat way and I need to extract all of the information I can from it. Here's an example.

<IRETURNVALUE> <INSTANCE CLASSNAME="junk"><PROPERTY NAME="ID"><DISPLAY TYPE="string"> +A0</DISPLAY><VALUE TYPE="string">a0</VALUE></PROPERTY><PROPERTY NAME= +"state"><DISPLAY TYPE="string">Active</DISPLAY><VALUE TYPE="string">a +ctive</VALUE></PROPERTY>...</INSTANCE> <INSTANCE CLASSNAME="junk"><PROPERTY NAME="ID"><DISPLAY TYPE="string"> +A1</DISPLAY><VALUE TYPE="string">a1</VALUE></PROPERTY><PROPERTY NAME= +"state"><DISPLAY TYPE="string">Active</DISPLAY><VALUE TYPE="string">a +ctive</VALUE></PROPERTY>...</INSTANCE> </IRETURNVALUE>

Each of these lines has about 8 or 9 different properties. What I need to do is to store each of these lines in a %data hash using the ID as the key. My preference is to use XML::Rules to do this, but I'm open to any suggestions using XML::Parser or XML::Twig. So far my pseudocode looks something like this:

#!/usr/bin/perl -w use strict; use warnings; use XML::Rules; my %data; my @rules = ( my $id; #$id will be the _content of DISPLAY after the Property Name my $var; #$var will be the _content of DISPLAY PROPERTY => sub { if ($_[1]{NAME} eq "id"){ $data{$id} = {}; } else ($data{$id} = {$_[1]{NAME} => $var}) } }, DISPLAY => sub {#some code to initialize $var}; my $xr = XML::Rules->new( rules => \@rules, stripspaces => 2 ); $xr->parse(<DATA>);

This is more or less the direction I was taking. Hope I was clear enough.

Edit : Solved, not the most elegant way to solve it but it does the job.

#!/usr/bin/perl -w use strict; use warnings; use vars qw/ %options /; use Getopt::Std; use XML::Rules; use Data::Dumper; #..... # Code to treat file #..... my %data; my $temp; my $id; my $trigger = 0; my @rules = ( PROPERTY => sub { if ($_[1]{NAME} eq "id"){ $data{$id} = {}; #Creates empty hash using the id as t +he key $trigger = 1; } elsif ($_[1]{NAME} eq "class"){ #class is the last tag bef +ore the new line $data{$id} -> {@{$_[1]}{NAME}} = $temp; $trigger = 0; #this resets the trigger to let the scri +pt know that the next property is } #going to be the id else { $data{$id} -> {@{$_[1]}{NAME}} = $temp; } }, DISPLAY => sub { if ($trigger == 0){ $id = @{$_[1]}{_content}; } else { $temp = @{$_[1]}{_content}; } }, ); my $xr = XML::Rules->new( rules => \@rules, stripspaces => 2 ); $xr->parsefile($File); print Dumper(\%data);

Replies are listed 'Best First'.
Re: (Solved) : XML::Rules using parent/child to parse through XML
by runrig (Abbot) on Aug 24, 2011 at 16:00 UTC
    It seems like you want to capture a 'row' of data at the INSTANCE level, and each row of data is keyed by the NAME of the PROPERTY and the content of the DISPLAY node under it, so I would go with:
    my @rules = ( INSTANCE => sub { # .... process properties # e.g. print Data::Dumper::Dumper($_[1]); return; }, PROPERTY => sub { $_[1]{NAME} => $_[1]{DISPLAY} }, DISPLAY => 'content', );
Re: (Solved) : XML::Rules using parent/child to parse through XML
by locked_user sundialsvc4 (Abbot) on Aug 25, 2011 at 13:21 UTC

    Prefacing my comments here with the statement that I have not read your posting too closely ... I would, categorically, say that “the appropriate way,” IMHO, for dealing with any “XML file,” is ... XML::Twig.   And, “the right way to get data out of an XML file” is... “XPath expressions.”

    My simple reason for saying this is that I know, from the beginning, that this particular approach will consistently work.   No matter how ugly the file is.   No matter how big it is.   If you feel like you’re going to have to start writing a programming structure to match the data structure, filled with nested-loops and hashrefs and what-not, slap your fingers and tell them to “back away from that keyboard, slowly.”   Use searches, and let XPath do all of the searching for you.

    Kindly don’t take these opinions to be as “dogmatic” as perhaps they initially sound.   I am merely saying:   It Just Works™, and I Like That™ ...   A Lot.™

      I disagree. XML::Rules also 'just works', often more elegantly than the twig. And sometimes I go straight to XML::LibXML for pure picking out things with XPath (if the file is "not too large") especially if the XPath is complex.