John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I'm reading some data using a DOM interface (XML::XPath, hoping for XML::XMLlib some day). It seems to be the way to go to be more than "simple", to give complete coverage to the feature set and it is a standard interface. However, it's not Perl. Rather than picking through the resulting Element and its related objects every time access is needed, I'm "deserializing" the data of interest into Perl data structures.

That is what XML::Simple does, but without sufficient guidance of what is wanted, and without all features being available. That is what .NET (excuse my French) does with the "Deserialize" attributes, and something that I've wrestled with because it's poorly designed and incomplete.

The point is, there is a step to do this transformation. And the work is mostly simple and repetitive. When coding, in an initial effort I decomposed the step into a function; e.g. take a DOM Node of type Element, and return a Foo structure (dumb structure, not a class). To eliminate duplicate code, I pushed the most common thing to do here into a separate function: take a list of names, and copy attributes having those names (if they exist) into a destination hash.

The idea is that a small number of to-the-point functions, with a reasonable amount of flexibility staying within its point, can be called to easily implement the function that converts a DOM Element into a Foo, for any Foo. That's as opposed to having a huge complex configuration structure that does it all automatically once set up, assuming the designer thought of everything. The provided functions can be used when applicable, with a small amount of code between them to handle odd things, and no problem convincing it to do something totally different.

So why isn't this already a nice CPAN module?

Has everyone written code like this? Or am I missing something?

—John

Replies are listed 'Best First'.
Re: Reading structures from XML
by Your Mother (Archbishop) on May 06, 2009 at 19:37 UTC

    I think the problem is that there just isn't a 1:1 way to do this. This is why XML::Simple gets so hairy. Consider-

    <container> <thing foo="bar">baz</thing> <thing qux="oop" foo=""/> <thing/> </container>

    What could that look like in a Perl structure? It would have to have objects mimicking DOM or something like "content/children" named placeholders.

    So, I don't think this is easy to solve and it will probably be messy or just a different API to the DOM that's almost as complicated. I'd love to be proved wrong in this case. :) I'm an XML::LibXML user that also would love having straight mapping to Perl structures now and then.

        No, I think I was the one missing something. That's pretty nice. I hadn't even considered using an empty hash key to point to the content. It's a nice in to account for what I'd seen as semi-intractable.

      That's my point — it is not simple nor automatic. In Perl, you design Perl datastructures or object constructors and subsequent manipulators. In XML, you design a good XML-ish way of doing it. .NET tries to make you think the in-program structure and the XML are one and the same, but the design issues are different, and you end up with a lame serialization of the in-program data, not a proper XML-mindset design.

      So, in general, have the user write a function that takes the Element and return the Perl object. But, have a nifty library of things to make doing that easy.

      I agree your example really needs a general DOM. But most real program designed to act on information read from an XML file will have data designed for that purpose; e.g. it will be sane and meaningful.

      —John

Re: Reading structures from XML
by Jenda (Abbot) on May 07, 2009 at 00:37 UTC

    I have to say that I do not understand what do you mean. Could you give us a few examples? So you want to go and convert all DOM Element objects created from a tag with some name converted to some specific datastructure? What do you do with the children? How do you store the result in the parent DOM Element? Or am I completely off?

    If I want to deserialize XML, I use XML::Rules. Most likely starting with the ruleset generated by inferRulesFromExample() or inferRulesFromDTD() and tweaking it as needed to get just the data I need the way I need them. Is something like this (skipping the DOM generation) something kinda like what you needed?

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.