I am sticking with XML::LibXML; especially because I need to be validating my input with a local schema prior to the actual parsing.

Then validate the input prior to actual parsing and do not let the choice of validator affect your choice of parser (extractor).

if you want to do something with the action_request/info_requoest right away:

use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, namespaces => { 'http://www.somedomain.tld/market_reg/admin_server/1.0' => '', 'http://www.w3.org/2001/XMLSchema-instance' => 'xsi', }, rules => { _default => 'content', instance_information => 'as is', 'action_request,info_request' => sub { my ($tag,$attr) = @_; print $attr->{action}, "\n"; while ( my ($k,$v) = each %{$attr->{instance_information}} +) { print " $k: $v\n"; } print "\n"; return; }, }, ); $parser->parse(\*DATA); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <instruction_request xmlns="http://www.somedomain.tld/market_reg/admin +_server/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somedomain.tld/market_reg/admin_ser +ver/1.0 admin_server.xsd"> <request> ...

if you want to extract the data and just the data:

use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, namespaces => { 'http://www.somedomain.tld/market_reg/admin_server/1.0' => '', 'http://www.w3.org/2001/XMLSchema-instance' => 'xsi', }, rules => { _default => 'content', instance_information => 'pass', 'action_request,info_request' => 'pass', 'request' => 'as array', 'instruction_request' => sub {$_[1]->{request}}, }, ); my $data = $parser->parse(\*DATA); use Data::Dumper; print Dumper($data); __DATA__ ...

if you need to distinguish between action and info requests:

... 'action_request,info_request' => sub { my ($tag, $attr) = @_; +$attr->{type} = $tag; return %{$attr}}, ...

In the first case only the data of one <request> are in memory at any time, in the others the whole data ends in memory, but trimmed down substantially.

Jenda
Enoch was right!
Enjoy the last years of Rome.


In reply to Re: XML::LibXML - parsing question!! by Jenda
in thread XML::LibXML - parsing question!! by MarkovChain

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.