Vynce has asked for the wisdom of the Perl Monks concerning the following question:

recently, for my own devious porpoises, i wrote a perlSAX handler to turn my XML into perl objects. the concept seems to me sanely generalizable, and i intend to write it up and offer it on the steps of the Temple at CPAN. but it seems to me to be a whale of a project, so first i wanted to consult the oracles and monks here for their wisdom.

the basic premise is that you might, in an object-oriented environment, already have a lot of object classes that are strikingly similar to being perl-based analogs of your XML elements. for example, if you have an XML Document featuring BOOK, CHAPTER, PARAGRAPH and LINE elements, you may well have perl object packages Publish::Book, Publish::Chapter, etc. that you want to create based on these elements. so why not have the handler turn all your precious elements into objects?

so, three of the many spokes in the wheel of my development:

  1. this idea seems round, and likely to roll. it may even be able to bear weight and take projects from one place to another. as such, it seems possible that someone has created it already. did i miss it in my glance through the used car lot? i wouldn't want to re-invent it.
  2. feeping creatures wanted at local meat packing plant... well, you can call it a zoo if you prefer. what would you want out of such a module? more importantly, what would you want in such a module? i will take care of your ideas as if they were my own (: (sorry, Intellectual Property is nothing to joke about in a place like this, i suppose. forgive my flippancy, please.)
  3. perhaps i have meditated too long by my $self in the rock garden at Mt. Crack. does this idea seem useless or, worse, bad to you? i am devout in my dedications to XML and object oriented perl, but it may be that no slave can serve two masters, and i am fooling myself by trying to bring them into the same house in this fashion. your wisdom is appreciated.

thanks,

.

Replies are listed 'Best First'.
Re: handling my PerlSAX handler
by jeroenes (Priest) on May 21, 2001 at 12:56 UTC

    Vynce,

    I really like the freshness of your ideas. Though I'm not an actual POOPer, I have some feelings about namespace. One of the main rules is:

    Thou shalt not pollute thy namespace with auto-generated, out-of-control aka user-supplied names.
    So I wouldn't use any module that would take one XML file, and pollute my namespace with (potentially) unexpected names that could overwrite my original namespace. Well, and that spells disaster for your module that generates Package:: subclasses. Even if they just are nodes on your own module. If there however is one or the other XML standard that nicely translates into an object-tree, than it's OK. But in that case, you are writing a package for a specific XML standard subclass rather than a generic XML parser.

    But the idea is cool. There are real XML experts over here, so wait for better answers. Until than, peek at Processing XML with Perl (by mirod) for a nice overview of different libs.

    Cheers,

    Jeroen
    "We are not alone"(FZ)

    Edit: chipmunk 2001-05-21

      Oooh, i see i was unclear.

      Thou shalt not pollute thy namespace with auto-generated, out-of-control aka user-supplied names

      Indeed! i meant to imply the case where the coder has already written the packages; the handler would create a new object of the appropriate type, not a new class.

      of course, you have to know what the classes are called for each element; i was thinking along the lines of:

      my $handler = new XML::PerlSAX::Handler::Sanctum( # assume all packages are named "Publish::$element" baseclass => 'Publisher', # prefer "Publish::$parent::$element" # over "Publish::$element", to allow a book title # to be a different object than a chapter title. heirarchic => 'preferred', ); my $otherhandler = new XML::PerlSAX::Handler::Sanctum( dictionary => { # clear cases... BOOK => 'Publish::Book', CHAPTER => 'Publish::Chapter', PARAGRAPH => 'Publish::Paragraph', # <BOOK><TITLE></TITLE></BOOK> gets treated differently # than <CHAPTER><TITLE>... or <PARAGRAPH><TITLE>... 'BOOK TITLE' => 'Publish::Book::Title', 'CHAPTER TITLE' => 'Publish::Chapter::Title', TITLE => 'Publish::SectionHeading', } );

      honestly, it had occurred to me to create the classes if they couldn't be found, but i decided that i'd only do that if the user told me to do so, and even then only if they didn't already exist. and maybe not even then; let the user auto-create their objects if they want.

      thanks for the pointer to mirod's work, as well.

      .

Re: handling my PerlSAX handler
by aardvark (Pilgrim) on May 21, 2001 at 17:18 UTC
    First of all ++ to you for taking the time to raise such devious porpoises.
    If you are going to build an module to turn XML elements and attributes into perl objects, you might as well base it on a wildely used DTD. It might be hard to generalize without using a solid reference point.

    I've just started using DocBook as our DTD and I was surpirsed that there was only one module on the CPAN that specifically addressed DocBook. It is a big DTD with over 300 elements but, it could take you down some interesting roads. It is well documented and there is an active mailing list for you to talk with.

    As XML becomes more widely used it seems there will be a need to write interfaces for specific DTDs. Personally I would love to see a module that would let me get/set Sect1 titles or get/set all the document authors and editors. People customize DocBook, and it would be cool if your module could read a DTD and provide an interface to all the elements and attributes that are being used.

    I don't know how devious your porpoises are, but, it would be interesting to see if you could 'feed' your module a DTD and have it create an interface to the document.

    my $parser = DeviousPorpoises->new(generic.dtd);
    Get Strong Together!!
Re: handling my PerlSAX handler
by mirod (Canon) on May 21, 2001 at 20:52 UTC

    Did you have a look at XML::Parser's Object style? It turns each element into an object which type is the element name. You can even use the Package option to create all of the objects in a separate package. Wouldn't this work for you? At least you could use the code there as a start for your module.

Re: handling my PerlSAX handler
by markjugg (Curate) on May 21, 2001 at 20:38 UTC
    Vynce,

    As I understand, the WDDX project is already in the business of converting between XML and the native language data structures, including Perl. It sounds slightly different than what you have in mind, but worth investigating if you haven't explored it yet.

    -mark

Re: handling my PerlSAX handler
by Anonymous Monk on May 23, 2001 at 01:38 UTC
    I think the idea seems sound enough just as long as you allow some sort of way to map XML elements to objects based on some other way than string. In other words storing a user defined (possibly through a higher level meta-language) hash entry such as <chapter> --> Publish::Chapter seems like a good idea, but parsing the element name string and automatically mapping that to an element seems like an inherently bad idea.

    Another thing to keep in mind is that XML by its nature is inherently structured and Perl by its nature is inherently unstructured so expect monstrous amounts of wrapping.

    One thing I would like would be the ability to support persistence using XML...ie mapping from XML to object and back to XML.

    Anyway, my $.02