in reply to XML::LibXML - parsing question!!

I am sticking with XML::LibXML; especially because I need to be validating my input with a local schema prior to the actual parsing.

Then validate the input prior to actual parsing and do not let the choice of validator affect your choice of parser (extractor).

if you want to do something with the action_request/info_requoest right away:

use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, namespaces => { 'http://www.somedomain.tld/market_reg/admin_server/1.0' => '', 'http://www.w3.org/2001/XMLSchema-instance' => 'xsi', }, rules => { _default => 'content', instance_information => 'as is', 'action_request,info_request' => sub { my ($tag,$attr) = @_; print $attr->{action}, "\n"; while ( my ($k,$v) = each %{$attr->{instance_information}} +) { print " $k: $v\n"; } print "\n"; return; }, }, ); $parser->parse(\*DATA); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <instruction_request xmlns="http://www.somedomain.tld/market_reg/admin +_server/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somedomain.tld/market_reg/admin_ser +ver/1.0 admin_server.xsd"> <request> ...

if you want to extract the data and just the data:

use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, namespaces => { 'http://www.somedomain.tld/market_reg/admin_server/1.0' => '', 'http://www.w3.org/2001/XMLSchema-instance' => 'xsi', }, rules => { _default => 'content', instance_information => 'pass', 'action_request,info_request' => 'pass', 'request' => 'as array', 'instruction_request' => sub {$_[1]->{request}}, }, ); my $data = $parser->parse(\*DATA); use Data::Dumper; print Dumper($data); __DATA__ ...

if you need to distinguish between action and info requests:

... 'action_request,info_request' => sub { my ($tag, $attr) = @_; +$attr->{type} = $tag; return %{$attr}}, ...

In the first case only the data of one <request> are in memory at any time, in the others the whole data ends in memory, but trimmed down substantially.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^2: XML::LibXML - parsing question!!
by MarkovChain (Sexton) on Dec 22, 2009 at 16:42 UTC

    Hi Jenda,

    Thanks for the feedback.

    The idea of validating before parsing is indeed what I intend to do in my final project. I was just doing a dry run in a test script to get comfortable with the modules before making changes to my code branch.

    That being said, good thoughts!! I will take a look at the XML::Rules module. I am currently upgrading my libxml2 on my mac. It comes with 2.6.16 installed and the latest is 2.7* .... that being said, I have a pretty rudimentary regex and it should pass. It passes in my XML Editor when it validates against the said XML Schema.