conrad has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,

I'm trying to parse an XML file into an object structure — my own object structure. I'm bewildered by the range of XML parsers and somewhat confused by the fact that none of them seem to do what I want — although this may be because all of the examples of their use that I've found are too simple to illustrate the full capabilities of the various parsers.

What I was expecting was that there'd be a parser which would allow me to create a range of subclasses of its Element class, and that these subclasses would have their start_tag and end_tag methods called as an XML document was being parsed — the start_tag acting as a constructor, and the end_tag checking the parsed list of contents objects and actually storing them in whatever way it wanted to.

I've not found something which seems to do this — but one of the Perl XML parsers must, I just can't find which!

Wishful code attached. The first part just uses XML::Parser's Objects mode, which is the closest approximation of what I want that I found, but it's still using its own structures to store the data and doesn't make callbacks against the classes it's blessing the data into. The second part sketches an example pair of classes in the style that I was expecting some parser or other to support — XML::Parser doesn't call their _tag methods though, of course.

Any suggestions?

Update 0: As I say below in response to tlm's question, I've discovered a name for what I'm after: XML Data Binding. Here's an article on the subject. It's Java-centric though, and doesn't offer any Perl answers...

Update 1: have written an extension to XML::Parser to do the job. See below.

Conrad

#!/usr/bin/perl use strict; use warnings; use XML::Parser; use Data::Dumper; my $text = '<container><thing>hi</thing><thing>there</thing></containe +r>'; my $parser = new XML::Parser(Style => 'Objects', Pkg => 'MyPkg'); my $object = $parser->parse($text)->[0]; print Dumper($object); { package MyPkg::container; # use base 'XML::Element'; # Or something like that? # Constructor method, nothing exciting happening here so # it's redundant but just to show I could have one... sub start_tag { my ($class, $container) = shift; # The parser should use start_tag's return value when # returning this parsed object to anything containing it (or # the caller if this is the root object). bless {}, $class } sub end_tag { # Should get contained objects as values # Check accumulated contents and store values in my hash - # here I'm expecting only a list of <thing>s... my ($self, @contents) = @_; die "Non-thing (" . ref $_ . ") in container!" for grep ! $_->isa('MyPkg::thing'), @contents; $self->{THINGS} = \@contents; } sub myMethod { # Execute stuff using parsed objects } } { package MyPkg::thing; # use base 'XML::Element'; # Or something like that? sub end_tag { # Should get contained objects as values # Check accumulated contents and store values in my hash - # here I'm expecting only plain text in the form of scalars. my ($self, @contents) = @_; die "Non-scalar (" . ref $_ . ") in thing!" for grep ref $_, @contents; $self->{DATA} = join('', @contents); } }

Replies are listed 'Best First'.
Re: Parse XML to my own objects?
by tlm (Prior) on Jun 28, 2005 at 12:34 UTC

    Both XML::Parser adn XML::Twig let you register callbacks associated with various parsing events (such as the starts and ends of tags). This should be enough to do what you want, I think.

    the lowliest monk

      So in XML::Parser in Stream and Subs mode they're definitely not per-element, so you'd need a gigantic switch-or-equivalent statement in your one-and-only StartTag routine to identify and instantiate the particular class you're after, plus it maintains no context (such as a parse stack) for you to tie into (I understand that this is a major reason for writing most of the other Perl XML parsers in fact, even though you can kind-of work around the problem using closures), plus it's not at all OO.

      XML::Twig also doesn't seem to have a good solution, in that it instantiates all elements in the same class — a user-selected class to be sure, but always the same one.

      I've discovered a name for what I'm after: XML Data Binding. Here's an article on the subject. It's Java-centric though, and doesn't offer any Perl answers...

Re: Parse XML to my own objects?
by jeffa (Bishop) on Jun 28, 2005 at 13:19 UTC

      Thanks, but in the same way as XML::Parser, it doesn't quite do what I want — decodes to a data structure, not an object structure, and then afterwards instantiates the data into objects. I'd rather not have to descend the data tree after the fact, examining it for structural correctness in one omniscient sub and blessing it piecemeal into objects… So far as I can see, the one-method-per-class thing I'm looking for is eminently possible but may just not yet have been implemented.

        I think it is going to be rather difficult to create the object while you parse the object tree. You have to have some kind of knowledge of what you are going to expect. At any rate, using a similar technique to the code i linked to, here is a script that parses the example XML from the link you provided. Hope it helps. :)
        direct link to D/L code

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
        For what it's worth... I'd be interested in seeing this :-)

        (Once again, just as I'm trying to get my head around something, a question about it pops up on PerlMonks. This is getting spooky!)

Re: Parse XML to my own objects? (aka XML Data Binding)
by conrad (Beadle) on Jun 28, 2005 at 15:55 UTC

    Hrm. Am too impatient. Implemented solution. Try this extension to XML::Parser — a new Style implementing XML Data Binding in one direction only. Turns out to be easy, now all I need to do is persuade the author that this is useful :-/

    Comments & suggestions welcome.

Re: Parse XML to my own objects? (aka XML Data Binding)
by jdporter (Paladin) on Sep 21, 2005 at 22:55 UTC
    I would think Class::XML should give you very close to exactly what you want.