comment on

Until Sean Burke's articles, it never really occurred to me that HTML could be represented and understood as a tree. For example, given this HTML:


   
<html>
<head>

<title>Doc 1</title>

</head>

  <body>
     Stuff 
     <hr> 
     2000-08-17
  </body>

</html>
[download]

the following tree results:

             html
             /      \
         head        body
        /          /   |  \
     title    "Stuff"  hr  "2000-08-17"
       |
    "Doc 1"
[download]

This slide provides another example of representing an HTML document as a tree.

The popular Perl HTML templating systems do not treat HTML manipulation as tree manipulation. At least not directly, because it may be the case that all programs and data structures can be represented as a tree (correct me on this). The popular Perl systems treat HTML as a character string and provide simple pseudo-operators to manipulate the display logic of this string.

While this is intuitive for programmers and designers alike, it is instructive to look at radically different approaches. In this article, I move through a number of common pseudo operators and HTML manipulations and show how each of these can be interpreted as a tree rewriting operation. Because Template is so well-documented and provides a representative feature set, it is easy to use for this purpose.

if

The iftag of the pseudo language decides whether a node of the tree will remain or not:

[% IF age < 10 %]
       Hello, does your mother know you're 
       using her AOL account?
    [% ELSIF age < 18 %]
       Sorry, you're not old enough to enter 
       (and too dumb to lie about your age)
    [% ELSE %]
       Welcome
    [% END %]
[download]

In the template HTML, we start with three candidate nodes:


ROOT
  child1:  Hello, does your mother know you're ...
  child2:  Sorry, you're not old enough to enter ...
  child3:  Welcome
[download]

And based on the conditional, we delete or preserve the child nodes. Now, I have looked at number of practical solutions for implementing this tree op in Perl, and after looking at

XML::Smart (promising, innovative, but in active development and young)
XML::XSH (powerful, professional and does not build completely on Cygwin)
XML::LibXML (very nice, but I have a gap: I can search using XML::XPath, but dont see how to integrate search results with tree processing via LibXML

I decided to use old faithful, HTML::Tree, to provide examples (and build the next generation of HTML::Seamstress).

So, here is how we handle this task using HTML tree rewrites. First we markup the HTML so we can find it:

<span id=age_handler>
  <span id="under10">
       Hello, does your mother know you're 
       using her AOL account?
  </span>
  <span id="under18">
       Sorry, you're not old enough to enter 
       (and too dumb to lie about your age)
  </span>
  <span id="welcome">
       Welcome
  </span>
</span>
[download]

And now we process it using HTML::Tree,

use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($filename);
$tree->age_handler($age);
print $tree->as_HTML;

sub age_handler {
   my ($tree, $age) = @_;
   my $SPAN = $tree->look_down('id', 'age_handler');
   if ($age < 10) {
    $SPAN->look_down('id', $_)->detach for qw(under18 welcome);
   } elsif ($age < 18) {
    $SPAN->look_down('id', $_)->detach for qw(under10 welcome);
  } else {
        $SPAN->look_down('id', $_)->detach for qw(under10 under18);
  }

}
[download]

Hmm, I'm worn out. Let's make this the first installation in the ongoing saga entitled: how to do HTML templating via tree rewrites: the HTML::Seamstress approach.

Just one more comment: all of that

look_down->detach() for
    ($this, $that)
[download]

should definitely be abstracted into some HTML::Stitchery such as:

$SPAN->KILL_CHILDREN (@children); # fodder for carnivore :)
[download]

Resources

There are other systems on CPAN which are tree-oriented. My system, HTML::Seamstress grew out of Paul Lucas' HTML_Tree by way of Evoscript, all of which was inspired by the Java XMLC framework. XMLC compiles a webpage into a java tree with API hooks for the various tags in the HTML. After you do tree rewriting on the little XML objects in the java tree, the build method builds the HTML page.

Petal is Perl's implementation of ZOPE's TAL This framework does quite a bit --- too much for me to want to figure out. And at times I felt like I was using Text::MagicTemplate because I had to know quite a bit about what to do on the HTML side to get my Perl data to enter the XML properly. All I want to do on the HTML side is put little id attributes in the HTML, find' em and rewrite 'em.

Xelig is also inspired by XMLC, but it is quite different from Seamstress. It is interesting but not so well-documented at the moment.

In reply to HTML Templating as Tree Rewriting: Part I: "If Statements" by princepawn

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.