Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

XML Parser issue

by siva kumar (Pilgrim)
on Jul 05, 2007 at 07:50 UTC ( [id://625013]=perlquestion: print w/replies, xml ) Need Help??

siva kumar has asked for the wisdom of the Perl Monks concerning the following question:

I have issue with XML Parsing. The following is the xml content.
<story> <page> <author>Author 1 </author> <keywords>Keyword1 </keywords> <headline>Headline1 </headline> <image> Image1 </image> <description>Desc 1 </description> </page> <page> <author>Author 2</author> <keywords>Keyword2 </keywords> <headline>Headline 2 </headline> <image> Image2 </image> <description>Decs 2 </description> </page> </story>
I need array of hash references. Means the above xml will produce two records in hash format. The two hash references will be pushed to array. I have written the below parser. But the result render only one record. The second record is empty. Please suggest.
#!/home/y/bin/perl5.8.5 -w use XML::Parser; use strict; use Data::Dumper; my $parser = new XML::Parser(ErrorContext => 2); my $xmlStr = ""; my $pageFlag =0 ; my $authorFlag = 0; my $titleFlag = 0; my $descFlag = 0; my $headlineFlag = 0; my $imageFlag = 0 ; my $keywordFlag = 0; my @pagesArray = (); my %hash = (); my $pageEnd = 0 ; my $authorTxt = ''; my $keywordTxt = ''; my $headlineTxt = ''; my $imageTxt =''; my $descTxt = ''; $xmlStr = " <story> <page> <author>Author 1 </author> <keywords>Keyword1 </keywords> <headline>Headline1 </headline> <image> Image1 </image> <description>Desc 1 </description> </page> <page> <author>Author 2</author> <keywords>Keyword2 </keywords> <headline>Headline 2 </headline> <image> Image2 </image> <description>Decs 2 </description> </page> </story>"; $parser->setHandlers( Start => \&start_handler, Char => \&char_handler, End => \&end_handler); $parser->parse($xmlStr); print Dumper @pagesArray; exit; sub char_handler { my ($p, $data) = @_; if($pageFlag ==1 && $authorFlag ==1 ){ $authorTxt .= $data; } if($pageFlag ==1 && $keywordFlag ==1 ){ $keywordTxt .= $data; } if($pageFlag ==1 && $headlineFlag ==1 ){ $headlineTxt .= $data; } if($pageFlag ==1 && $imageFlag ==1 ){ $imageTxt .= $data; } if($pageFlag ==1 && $descFlag ==1 ){ $descTxt .= $data; } } sub start_handler { my ($p, $data) = @_; if($data =~ /^(page)$/) { %hash = (); $pageFlag = 1; } if($data =~ /^author$/) { $authorFlag = 1; } if($data =~ /^keywords$/) { $keywordFlag = 1; } if($data =~ /^headline$/) { $headlineFlag = 1; } if($data =~ /^image$/) { $imageFlag = 1; } if($data =~ /^description$/) { $descFlag = 1; } } sub end_handler { my ($p, $data) = @_; if($data =~ /^author$/) { $hash{author} = $authorTxt; $authorTxt = ''; print "Author -- $hash{author} \n"; $authorFlag = 0; } if($data =~ /^keywords$/) { $hash{keywords} = $keywordTxt; $keywordTxt = ''; print "Keyword -- $hash{keywords}\n"; $keywordFlag = 0; } if($data =~ /^headline$/) { $hash{headline} = $headlineTxt; $headlineTxt = ''; print "Head -- $hash{headline}\n"; $headlineFlag = 0; } if($data =~ /^image$/) { $hash{image} = $imageTxt; $imageTxt = ''; print "Image -- $hash{image}\n"; $imageFlag = 0; } if($data =~ /^description$/) { $hash{description} = $descTxt; $descTxt =''; print "Desc -- $hash{description}\n"; $descFlag = 0; } if($data =~ /^page$/) { push(@pagesArray,\%hash); } }

Replies are listed 'Best First'.
Re: XML Parser issue
by GrandFather (Saint) on Jul 05, 2007 at 08:39 UTC

    You are pushing multiple references to the same hash. You need to create a new hash for each page. A small change that does that is to change

    if($data =~ /^page$/) { push(@pagesArray,\%hash); }

    to:

    if($data =~ /^page$/) { push @pagesArray, {%hash}; %hash = (); }

    which simply creates a copy of the hash.


    DWIM is Perl's answer to Gödel
Re: XML Parser issue
by rg0now (Chaplain) on Jul 05, 2007 at 12:28 UTC
    I need array of hash references. Means the above xml will produce two records in hash format. The two hash references will be pushed to array. I have written the below parser. But the result render only one record. The second record is empty. Please suggest.
    Well, this is a job for XML::Simple. I think the script below does more or less what you want:
    #!/usr/bin/perl use warnings; use strict; use XML::Simple; use Data::Dumper; my $xml = XMLin( \*DATA, ForceArray => [ 'page' ], ); print Dumper $xml->{page}; __DATA__ <story> <page> <author>Author 1 </author> <keywords>Keyword1 </keywords> <headline>Headline1 </headline> <image> Image1 </image> <description>Desc 1 </description> </page> <page> <author>Author 2</author> <keywords>Keyword2 </keywords> <headline>Headline 2 </headline> <image> Image2 </image> <description>Decs 2 </description> </page> </story> __OUTPUT__ $VAR1 = [ { 'keywords' => 'Keyword1 ', 'author' => 'Author 1 ', 'description' => 'Desc 1 ', 'image' => ' Image1 ', 'headline' => 'Headline1 ' }, { 'keywords' => 'Keyword2 ', 'author' => 'Author 2', 'description' => 'Decs 2 ', 'image' => ' Image2 ', 'headline' => 'Headline 2 ' } ];
    Best regards,
Re: XML Parser issue
by Jenda (Abbot) on Jul 05, 2007 at 12:40 UTC

    Here's the solution using XML::Rules:

    #!/usr/bin/perl use warnings; use strict; use XML::Rules; use Data::Dumper; my $xml = do { local $/; <DATA>}; my $parser = XML::Rules->new( rules => [ _default => 'content', page => 'no content array', story => 'pass no content', ] ); my $data = $parser->parse($xml); print Dumper $data->{page}; __DATA__ <story> <page> <author>Author 1 </author> <keywords>Keyword1 </keywords> <headline>Headline1 </headline> <image> Image1 </image> <description>Desc 1 </description> </page> <page> <author>Author 2</author> <keywords>Keyword2 </keywords> <headline>Headline 2 </headline> <image> Image2 </image> <description>Decs 2 </description> </page> </story>

    If you wanted to process each <page> separately, as the file is parsed, you could do something like this:

    #!/usr/bin/perl use warnings; use strict; use XML::Rules; use Data::Dumper; my $xml = do { local $/; <DATA>}; my $parser = XML::Rules->new( rules => [ _default => 'content trim', page => sub { print <<"*END*" "$_[1]->{headline}" by $_[1]->{author} Keywords: $_[1]->{keywords} Description: $_[1]->{description} *END* }, story => '', ] ); $parser->parse($xml); __DATA__ ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://625013]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 03:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found