The Task

Creation of some incredibly large XML documents

The "Suggestion"

Wouldn't it be nice if we had a way to do, like, a CGI thing, where you just have methods and stuff to do the tag work for you?

Who drew the short straw?

boo_radley

The Investigation

I've tried to use XML::Generator and XML::AutoWriter, but both fail to make on Win32 with fairly serious errors. Failing any other options, I have put together the following package, which seeks to mimick CGI's oo interface.

The Code

I found SGML::DTD, dusted it off and put it to work. Does anyone have other recommendations for DTD processing and reporting modules?
package XML::Generator::FromDTD; use strict; use SGML::DTD; sub makepackage { # all of this symbolic referencing makes me nervous. it's like dis +abling a wood chipper's saftey. no strict; *{"$_[0]::new"}= sub { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; bless ($self, $class); return $self; } ; } sub sub_maker { my $s =shift; my $attrlist = join " ",@{+shift}; sub { my $self = shift; my %attrs; my @attrs; %attrs = %{+shift}if ref $_[0] eq "HASH" ; foreach my $thisattr(keys %attrs){ # leave it in, so... warn "no attribute $thisattr\n" unless $attrlist=~/$thisat +tr/; # is there text? if ($attrs{$thisattr}) { # if so, make foo="bar", push @attrs,"$thisattr=\"". $attrs{$thisattr} ."\"" } else { # otherwise, just add an attribute foo push @attrs,"$thisattr" } } # assemble attribute text my $attrtext = join (" ",@attrs); $attrtext= " ".$attrtext if $attrtext; # return an empty tag if there's no text to go with it. return "<$s$attrtext/>" if (@_ == 0); # if there's an arrayref, make a tag for each element # yes, anything after $_[0] gets ignored. if (ref ($_[0]) eq "ARRAY") { return join "", map {"<$s$attrtext>$_$s"} @{ $_[0] }; } # otherwise, return the whole length of @_ inside a tag. return "<$s$attrtext>@_$s"; } } sub createFromDTD { my ($fn, $packagename) = @_; my $dtd = new SGML::DTD; my %elements; open FH, $fn ||die $!; $dtd->read_dtd(\*FH) || die $!; # now that the file's been read and parsed by SGML::DTD object, # begin the real work makepackage ($packagename); { no strict "refs"; foreach ($dtd->get_elements()){ # read in the attributes of each element my %hr = $dtd->get_elem_attr($_); my @attributes; my $hr; @attributes = keys %hr if (defined (%hr)); # make a new sub in the user specified package # named after the current element *{"$packagename:".":$_"} =sub_maker ($_, $dtd->get_elem_at +tr, \@attributes); # create a list of attributes for later use # this hash is also used in the 'methods' method, below $elements {$_}=\@attributes; } *{"$packagename:".":_methods"} = sub {return keys %{$_[0]->{_e +lements}} }; *{"$packagename:".":_attributes"} = sub { my $self=$_[0]; print "got $_[1]\n"; return @{$self->{_elements}{$_[1]}} }; } # finish off the newly created package my $tmp = new $packagename; $$tmp{_elements} = \%elements; return $tmp; } return 1;

Explained

This takes a DTD and makes a brand new package & object from it. The object's methods are primarily the elements listed in the DTD. There's 2 helper methods available to you, methods, which lists the method's the class is aware of, and attributes which lists the attributes valid for each tag.
so, if we have a DTD that looks like :
ELEMENT cookbook (recipe+) ELEMENT recipe (head?, (ingredientList|procedure|para)*) ATTLIST recipe serves CDATA #IMPLIED ELEMENT head (#PCDATA) ELEMENT ingredientlist (ingredient+) ELEMENT ingredient (#PCDATA|food|quantity)* ELEMENT procedure (step+) ELEMENT food (#PCDATA) ATTLIST food type (veg|prot|fat|sugar|flavour|unspec) "unspec" calories (high|medium|low|none|unknown) "unknown" ELEMENT quantity EMPTY ATTLIST quantity value CDATA #REQUIRED units CDATA #IMPLIED exact (Y|N) "N" ELEMENT para (#PCDATA|food)* ELEMENT step (#PCDATA|food)*

This module will allow us to do the following:
$r=XML::Generator::FromDTD::createFromDTD ("cookbook.dtd","cookbook"); print $r->cookbook ( $r->recipe({serves=>"one"}, $r->head("Breakfast burrito"), $r->ingredientlist ( $r->ingredient ( $r->food ({type=>"unspec", calories=>"unknown"}, [ "tortilla", "egg","hash browns","cheese","bacon"]) +, ) ) ) )

The food method in the above snippet shows the use of an anonymous hashref. Any of ::FromDTD's generated methods will wrap each element of an array ref in tags. This means that tortilla, egg, hash browns, cheese and bacon each get their own tag :
<food calories="unknown" type="unspec">tortilla</food> <food calories="unknown" type="unspec">egg</food> <food calories="unknown" type="unspec">hash browns</food> <food calories="unknown" type="unspec">cheese</food> <food calories="unknown" type="unspec">bacon</food>
This code will produce the following XML fragment. Note that the XML declaration and document type declaration are not produced by the module :
<cookbook> <recipe serves="one"> <head>Breakfast burrito</head> <ingredientlist> <ingredient> <food calories="unknown" type="unspec">tortilla</food> <food calories="unknown" type="unspec">egg</food> <food calories="unknown" type="unspec">hash browns</food> <food calories="unknown" type="unspec">cheese</food> <food calories="unknown" type="unspec">bacon</food> </ingredient> </ingredientlist> </recipe> </cookbook>

Questions

Excepting for the modules that I listed in investigation, I haven't seen anything that will reproduce this effect. Is this worthwhile to flesh out? Have I missed other modules? Is PerlSGML still a good suite of tools to use? There's been a lot of drive for XML in perl lately, and I considered XML::Doctype and XML::LibXML::DTD, but found SGML::DTD to be the shortest path between me my goal. And as always, I welcome your comments, criticisms and suggestions.

In reply to RFC : XML::Generator::FromDTD by boo_radley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.