In this meditation, we will embed a mini-language for building XML documents into Perl. Our goal is to see how much syntax we can remove in pursuit of what Damian Conway calls "sufficiently advanced technologies." We want to make building XML just like writing native Perl:
# html { # head { title { text "Title" } }; # body { # p { class_ "warning"; text "paragraph" } # }; # }

There is nothing particularly novel about this approach, and there are similar libraries for many programming languages. Our implementation, however, will stress Perlishness and simplicity. To eliminate clutter during the meditation, we will not make a module but instead expose the underlying code.

Here is our game plan. We will represent XML documents as trees of nested arrays and then render the trees as XML. A node in our tree will be either text (represented as a string) or an element (represented as a triple of the form [name, attributes, children_nodes]). Attributes will be pairs of the form [name, value]. (We will ignore namespaces, XML declarations, and other aspects of XML generation that don't add much to the meditation.)

To build a document, we will call functions that append elements, attributes, and text to the active node in the tree, redefining the active node in passing:

To make it all seem more natural, we will use the (&) prototype on element-creating functions and doc. This lets us use braces to represent nesting when calling the functions:

# doc { # my_elem { # # children go here # }; # };

Likewise, the attribute-creating functions and text get the ($) prototype. This lets us call them without having to use parentheses:

# my_elem { # text "some text"; # my_attr_ "value"; # };

With the game plan in mind, let's work top down:

our $__frag; # points to fragment under active construction sub doc(&) { my ($content_fn) = @_; local $__frag = [undef,undef,undef]; $content_fn->(); $__frag->[2][0]; } sub _elem { my ($elem_name, $content_fn) = @_; # an element is represented by the triple [name, attrs, children] my $elem = [$elem_name, undef, undef]; do { local $__frag = $elem; $content_fn->() }; push @{$__frag->[2]}, $elem; } sub _attr { my ($attr_name, $val) = @_; push @{$__frag->[1]}, [$attr_name, $val]; } sub text($) { push @{$__frag->[2]}, @_; }
The functions _elem and _attr are helpers used by the following function, which lets us embed a custom XML vocabulary into Perl by creating the appropriate Perl functions for the vocabulary's elements and attributes:
sub define_vocabulary { my ($elems, $attrs) = @_; eval "sub $_(&) { _elem('$_',\@_) }" for @$elems; eval "sub ${_}_(\$) { _attr('$_',\@_) }" for @$attrs; }
We can use the above function, for example, to embed a subset of XHTML into Perl:
BEGIN { define_vocabulary( [qw( html head title body h1 h2 h3 p img br )], [qw( src href class style )] ); }
(The use of BEGIN ensures that the embedded functions' prototypes are established before any remaining code is compiled.)

Let's try out our newly embedded vocabulary by dumping out the internal representation of a simple document:

my $my_doc = doc { html { head { title { text "Title" } }; body { p { class_ "warning"; text "paragraph" } }; } }; use Data::Dumper; $Data::Dumper::Indent = $Data::Dumper::Terse = 1; print Dumper $my_doc; # [ # 'html', # undef, # [ # [ # 'head', # undef, # [ # [ # 'title', # undef, # [ # 'Title' # ] # ] # ] # ], # [ # 'body', # undef, # [ # [ # 'p', # [ # [ # 'class', # 'warning' # ] # ], # [ # 'paragraph' # ] # ] # ] # ] # ] # ]

Good! That's just what we want.

All that is left for us to do is render the internal representation as XML. The simplicity of our internal representation makes this straightforward. Here's a renderer for XML::Writer:
use XML::Writer; sub render_via_xml_writer { my $doc = shift; my $writer = XML::Writer->new(@_); # extra args go to ->new() my $render_fn; $render_fn = sub { my $frag = shift; my ($elem, $attrs, $children) = @$frag; $writer->startTag( $elem, map {@$_} @$attrs ); for (@$children) { ref() ? $render_fn->($_) : $writer->characters($_); } $writer->endTag($elem); }; $render_fn->($doc); $writer->end(); }
Now we can render our earlier document:
render_via_xml_writer( $my_doc, DATA_MODE => 1, UNSAFE => 1 ); # <html> # <head> # <title>Title</title> # </head> # <body> # <p class="warning">paragraph</p> # </body> # </html>
In most cases we will render documents shortly after creating them. We can "huffmanize" this common case with another helper, which supplies the outer doc for us and then renders the resulting tree:
sub render_doc(&) { my $docfn = shift; render_via_xml_writer( doc( \&$docfn ), DATA_MODE => 1, UNSAFE => 1 ); }
Our final example shows the fruits of our labors. We have successfully embedded a custom subset of XHTML into Perl. Now we can use it to create XML fragments with very little syntactic overhead. Further, because our embedding is "just Perl," we can freely mix code and fragments to do the work of template engines:
render_doc { html { head { title { text "My grand document!" } }; body { h1 { text "Heading" }; p { class_ "first"; # attribute class="first" text "This is the first paragraph!"; style_ "font: bold"; # another attr }; # it's just Perl, so we can mix in other code for (2..5) { p { text "Plus paragraph number $_." } } }; }; }; # <html> # <head> # <title>My grand document!</title> # </head> # <body> # <h1>Heading</h1> # <p class="first" style="font: bold">This is the first paragraph!</p> # <p>Plus paragraph number 2.</p> # <p>Plus paragraph number 3.</p> # <p>Plus paragraph number 4.</p> # <p>Plus paragraph number 5.</p> # </body> # </html>
Thanks for taking the time to read this meditation! If you find anything about it unclear, or can think of a way to improve my writing, please let me know.

Cheers
Tom


In reply to Embedding a mini-language for XML construction into Perl by tmoertel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.