Output of HTML tree built with TreeBuilder

simon.proctor has asked for the wisdom of the Perl Monks concerning the following question:

A while ago I had to build a customised HTML validation and strip tool that would work to very custom requirements. Consequently HTML Tidy wasn't exactly what we wanted and so I built a tool based on HTML::TreeBuilder.

The code runs fine but the output stage for printing has become quite monolithic (or at least if feels that way).

The code operates by first building the node tree and then processing it via walking (tramping?) over the tree and inserting/deleting nodes, adding/removing attributes and converting values (like colours) into what we want them to be. The second stage is to re-walk this tree and print it.

Its regarding this second stage that I've posted here. The page output has to have a very well formatted, clean output but the code that does this is quite complex. Currently, I have a recursive solution supported by lists that maintain a stack of ancestors to our current node.

So we call the routine with a node, pass a handful of if statements (creating output) determine if the node has children (recurse if we do), pass a few more if statements (creating output) and return. Needless to say that the if logic is getting increasingly complex and is getting less and less customisable.

Frankly I'm stumped as to how to refactor this. In fact I'm prepared to re-write the output stage from scratch but thought it best to get some advice first.

So does anyone have any ideas? I can post some code if required but I'm not sure how useful it would be.

Thanks in advance,

SP

Comment on Output of HTML tree built with TreeBuilder

Replies are listed 'Best First'.

Re: Output of HTML tree built with TreeBuilder
by dash2 (Hermit) on Jun 20, 2003 at 11:36 UTC

has parent  ||  is marked comment || node tag || etc...
[download]

indent with tabs || newline before || etc.
[download]

has parent	is marked comment	node tag	indent with tabs	newline before
yes	yes	a	yes	...
yes	yes	b	yes	...
...

For example, if you find out that there are only 3 main output styles, then you can rewrite the subroutine to look at the inputs, and then call one of 3 subroutines (you could put them in a dispatch table in case you need more).

Or, if you think the decision is more complex, you might want to create objects to decide how to output the code. For example, you could create NodeWriter::HasParent to write out nodes with parents. Maybe table cell nodes are handled slightly different, so NodeWriter::HasParent::Td could inherit but override some methods. Then you can decide which object to create:

sub prepareOutput {
  my $self = shift;
  my ($node) = @_;
  my $writer = $self->create_nodewriter($node);
  $self->[OUTPUT] .= $writer->write_output($node);
}

sub create_nodewriter {
  my $self = shift;
  my ($node) = @_;
  $subtype = $node->parent? 'HasParent':'NoParent';
  $tagtype = ucfirst $node->tag;
  $class = "NodeWriter::$subtype" . "::$tagtype";
  return $class->new(); 
}
[download]

In short what I am suggesting is: "separate policy from mechanism".

A massive flamewar beneath your chosen depth has not been shown here

[reply]
[d/l]
[select]

Output stage code (long)
by simon.proctor (Vicar) on Jun 20, 2003 at 10:42 UTC