boo_radley has asked for the wisdom of the Perl Monks concerning the following question:

The Task

Creation of some incredibly large XML documents

The "Suggestion"

Wouldn't it be nice if we had a way to do, like, a CGI thing, where you just have methods and stuff to do the tag work for you?

Who drew the short straw?

boo_radley

The Investigation

I've tried to use XML::Generator and XML::AutoWriter, but both fail to make on Win32 with fairly serious errors. Failing any other options, I have put together the following package, which seeks to mimick CGI's oo interface.

The Code

I found SGML::DTD, dusted it off and put it to work. Does anyone have other recommendations for DTD processing and reporting modules?
package XML::Generator::FromDTD; use strict; use SGML::DTD; sub makepackage { # all of this symbolic referencing makes me nervous. it's like dis +abling a wood chipper's saftey. no strict; *{"$_[0]::new"}= sub { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; bless ($self, $class); return $self; } ; } sub sub_maker { my $s =shift; my $attrlist = join " ",@{+shift}; sub { my $self = shift; my %attrs; my @attrs; %attrs = %{+shift}if ref $_[0] eq "HASH" ; foreach my $thisattr(keys %attrs){ # leave it in, so... warn "no attribute $thisattr\n" unless $attrlist=~/$thisat +tr/; # is there text? if ($attrs{$thisattr}) { # if so, make foo="bar", push @attrs,"$thisattr=\"". $attrs{$thisattr} ."\"" } else { # otherwise, just add an attribute foo push @attrs,"$thisattr" } } # assemble attribute text my $attrtext = join (" ",@attrs); $attrtext= " ".$attrtext if $attrtext; # return an empty tag if there's no text to go with it. return "<$s$attrtext/>" if (@_ == 0); # if there's an arrayref, make a tag for each element # yes, anything after $_[0] gets ignored. if (ref ($_[0]) eq "ARRAY") { return join "", map {"<$s$attrtext>$_$s"} @{ $_[0] }; } # otherwise, return the whole length of @_ inside a tag. return "<$s$attrtext>@_$s"; } } sub createFromDTD { my ($fn, $packagename) = @_; my $dtd = new SGML::DTD; my %elements; open FH, $fn ||die $!; $dtd->read_dtd(\*FH) || die $!; # now that the file's been read and parsed by SGML::DTD object, # begin the real work makepackage ($packagename); { no strict "refs"; foreach ($dtd->get_elements()){ # read in the attributes of each element my %hr = $dtd->get_elem_attr($_); my @attributes; my $hr; @attributes = keys %hr if (defined (%hr)); # make a new sub in the user specified package # named after the current element *{"$packagename:".":$_"} =sub_maker ($_, $dtd->get_elem_at +tr, \@attributes); # create a list of attributes for later use # this hash is also used in the 'methods' method, below $elements {$_}=\@attributes; } *{"$packagename:".":_methods"} = sub {return keys %{$_[0]->{_e +lements}} }; *{"$packagename:".":_attributes"} = sub { my $self=$_[0]; print "got $_[1]\n"; return @{$self->{_elements}{$_[1]}} }; } # finish off the newly created package my $tmp = new $packagename; $$tmp{_elements} = \%elements; return $tmp; } return 1;

Explained

This takes a DTD and makes a brand new package & object from it. The object's methods are primarily the elements listed in the DTD. There's 2 helper methods available to you, methods, which lists the method's the class is aware of, and attributes which lists the attributes valid for each tag.
so, if we have a DTD that looks like :
ELEMENT cookbook (recipe+) ELEMENT recipe (head?, (ingredientList|procedure|para)*) ATTLIST recipe serves CDATA #IMPLIED ELEMENT head (#PCDATA) ELEMENT ingredientlist (ingredient+) ELEMENT ingredient (#PCDATA|food|quantity)* ELEMENT procedure (step+) ELEMENT food (#PCDATA) ATTLIST food type (veg|prot|fat|sugar|flavour|unspec) "unspec" calories (high|medium|low|none|unknown) "unknown" ELEMENT quantity EMPTY ATTLIST quantity value CDATA #REQUIRED units CDATA #IMPLIED exact (Y|N) "N" ELEMENT para (#PCDATA|food)* ELEMENT step (#PCDATA|food)*

This module will allow us to do the following:
$r=XML::Generator::FromDTD::createFromDTD ("cookbook.dtd","cookbook"); print $r->cookbook ( $r->recipe({serves=>"one"}, $r->head("Breakfast burrito"), $r->ingredientlist ( $r->ingredient ( $r->food ({type=>"unspec", calories=>"unknown"}, [ "tortilla", "egg","hash browns","cheese","bacon"]) +, ) ) ) )

The food method in the above snippet shows the use of an anonymous hashref. Any of ::FromDTD's generated methods will wrap each element of an array ref in tags. This means that tortilla, egg, hash browns, cheese and bacon each get their own tag :
<food calories="unknown" type="unspec">tortilla</food> <food calories="unknown" type="unspec">egg</food> <food calories="unknown" type="unspec">hash browns</food> <food calories="unknown" type="unspec">cheese</food> <food calories="unknown" type="unspec">bacon</food>
This code will produce the following XML fragment. Note that the XML declaration and document type declaration are not produced by the module :
<cookbook> <recipe serves="one"> <head>Breakfast burrito</head> <ingredientlist> <ingredient> <food calories="unknown" type="unspec">tortilla</food> <food calories="unknown" type="unspec">egg</food> <food calories="unknown" type="unspec">hash browns</food> <food calories="unknown" type="unspec">cheese</food> <food calories="unknown" type="unspec">bacon</food> </ingredient> </ingredientlist> </recipe> </cookbook>

Questions

Excepting for the modules that I listed in investigation, I haven't seen anything that will reproduce this effect. Is this worthwhile to flesh out? Have I missed other modules? Is PerlSGML still a good suite of tools to use? There's been a lot of drive for XML in perl lately, and I considered XML::Doctype and XML::LibXML::DTD, but found SGML::DTD to be the shortest path between me my goal. And as always, I welcome your comments, criticisms and suggestions.

Replies are listed 'Best First'.
Re: RFC : XML::Generator::FromDTD
by hagus (Monk) on May 21, 2002 at 01:53 UTC
    FWIW, I use XML::Xerces::DOMParser, and build things that way. I write wrapper classes for each broad category of XML tag I want to generate.

    These classes have get/set methods which just manage some anon hashes in my class's instance data. I also have two methods - getXML and parseXML, which output the entire tree and snarf in a new tree, respectively.

    getXML essentially converts my perl-friendly hash representation of the XML tree into a Xerces DOM, then returns the text representation. parseXML does the opposite - snarfs into a DOM and moves it into a nice perl hash. I've never used a DTD, so I'm not sure how far my idea would get you on that front. But as far as a way for "methods and stuff to do the tag work for you", I think it works okay.

    Out of interest, how can a DTD help me? What does it bring to the table?

    --
    Ash OS durbatulk, ash OS gimbatul,
    Ash OS thrakatulk, agh burzum-ishi krimpatul!
    Uzg-Microsoft-ishi amal fauthut burguuli.

      hagus sez :
      Out of interest, how can a DTD help me? What does it bring to the table?


      I'll quote from the introduction to Learning XML, since it's got a good concise explanation :

      Unambiguous Structure

      XML takes a hard line when it comes to structure. A document should be marked up in such a way that there are no two ways to interpret the names, order and hierarchy of the elements. This vastly resudces errors and code complexity. Programs don't have to take an educated guess or try to fix syntax mistakes the way HTML browsers often do, as there are no surprises of on XML processor creating a different result from another...
      The DTD is a blueprint for document structure. An XML schema can restrict the types of data that are allowed to go inside elements...
      So, before you start writing an XML document, you should consult the corresponding DTD so you know which tags can accept what attributes, and what those attributes should be, which tags need to be empty and so on. When your XML document is parsed, it gets (should get?) validated against whatever DTD is found in the doctype declaration.
      My hope was that, by creating a module that created an object whose methods complied to (or mostly to, I wanted to run it by pm first and see if the idea was viable) a DTD, I could time & effort debugging later on down the road.
Re: RFC : XML::Generator::FromDTD
by ChemBoy (Priest) on May 22, 2002 at 04:34 UTC

    I took a look at doing precisely this a few months ago, but determined that I didn't have the time and expertise for it (I got XML::ValidWriter to install, but nothing much more). All I can say is, please, write it! It's too late for the job I needed it for, but future generations will still thank you profusely (assuming, of course, that future generations are allowed to see it--obviously not a given).

    In my investigations (such as they were) I also came to the conclusion that SGML::DTD was the way to go, though you seem in any case to have done a more thorough job of researching the topic than I did. I do have some reservations about code that raises four different warnings at compile time, but it is somewhat aged (last change was in 1997), so perhaps that's to be expected.

    Good luck!



    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders