Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

XML::Parser and objects

by Ineffectual (Scribe)
on Feb 28, 2003 at 01:18 UTC ( [id://239310]=perlquestion: print w/replies, xml ) Need Help??

Ineffectual has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing an object oriented parser for complex XML documents (blast output). I need to parse certain fields out of a complex tree and I want to skip over the ones that I don't need (or avoid saving the data).

Hit_num is the first element that I'd like to save in the file (it is a child element of other things that I don't want to save).

The main problem (right now) is that I'm not sure if this is going to work with object oriented style and whether I should just dump that part. What I'm running into is that I can't pass variables between the start_handler, char_handler, and end_handler. I don't know if there's a way in XML::Parser to do this, I've looked at the docs and can't figure that part out. Perhaps this is because I'm using parsefile instead of parse?

As you can probably tell by now, this is my first script involving XML and any advice would be appreciated.

-----
my $pack = NCBIXML->new; $pack->createHandlers; my $parser = $pack->retrieve('HNDL'); $parser->parsefile($file); my %data = $parser->retrieve('RES'); ############## sub createHandlers { my $obj = shift; my $parser = new XML::Parser(ErrorContext=>2); $parser->setHandlers( Start => \&start_handler); $parser->setHandlers( End => \&end_handler); $parser->setHandlers( Char => \&char_handler); $obj->{'HNDL'} = $parser; } sub start_handler { my ($obj, $element, %attrs) = @_; $tag = $element; # primary key for the array $obj->{'TAG'} = $tag; } sub char_handler { my ($obj, $data) = @_; my $tag = $obj->retrieve('TAG'); my %info = $obj->retrieve('RES'); if ($tag eq 'Hit_num') { $key = $data; next; } $info{$key}{$tag} = $data; $obj->{'RES'} = \%info; }
Thanks for any help. Ineff

Replies are listed 'Best First'.
Re: XML::Parser and objects
by diotalevi (Canon) on Feb 28, 2003 at 01:47 UTC

    Interesting. I was having a similar problem and I solved it by making each of the handlers closures with a shared lexical. While I had each handler as a named subroutine it is also probably better written as a generator. I'll include samples of each mode - what I actually used and what I think is better.

    # My actual implementation { my $object; sub setobject { $object = shift; } # Each of these routines has access to the shared # lexical. Enclosing this whole section in a block # prevents the lexical from being used in other scopes sub start_handler { ... } sub char_handler { ... } sub end_handler { ... } } # What is probably better though there is a heck of a lot # of space to move this around. I implemented this as a # wrapper over some named subroutines and passed the shared # variable in. You can really do this however you feel. sub get_handlers { my $object; return { start_handler => sub { start_handler(\@_,\$object) }, end_handler => sub { end_handler(\@_,\$object) }, char_handler => sub { char_handler(\@_,\$object) } } }

    Seeking Green geeks in Minnesota

Re: XML::Parser and objects
by mirod (Canon) on Feb 28, 2003 at 08:58 UTC

    It's ime for the obnoxious XML::Twig guy to step in I guess ;--)

    XML::Twig is designed for that kind of situation, it will let you load a view of the document that includes only the root and the Hit_num elements:

    my $twig= XML::Twig->new( twig_roots => { Hit_num => 1 } ); $twig->parsefile( $file); my @hit_nums= $twig->root->children; # do stuff with the Hit_num's

    Alternatively you can handle each Hit_num during the parsing:

    my $twig= XML::Twig->new( twig_roots => { Hit_num => \&hit_num } ); $twig->parsefile( $file); sub hit_num { my( $t, $hit_num)= @_; # do stuff with the hit_num # $t->purge will free the memory if you don't need # to keep the hit_num around }

    If you need to pass additional info to the handler you can use a closure, as diotalevi showed:

    my $twig= XML::Twig->new( twig_roots => { Hit_num => sub { hit_num( @_ +, $state_info); } } ); # ... sub hit_num { my( $t, $hit_num, $state_info)= @_;

    BTW there was a pretty good article by Simon Cozens on perl.com a while ago that gives more details on closures: Achieving Closure.

Re: XML::Parser and objects
by grantm (Parson) on Feb 28, 2003 at 06:32 UTC

    You have recognised one of the main problems with the XML::Parser API and one of the reasons coding directly to its API should be considered deprecated. If you want to take an event-based approach to your parsing, then the SAX API is a better bet. As well as being OO all the way, it's independent of the underlying parser module and allows you to take advantage of off-the-shelf filter modules.

    Switching your code from XML::Parser's handler API to SAX isv not a major undertaking. For a primer, start with the introduction in the distribution or look for Kip Hampton's articles on XML.com. XML::SAX::Expat is probably the easiest SAX parser to install since you already have expat.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://239310]
Approved by Thelonius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found