A while ago I had to build a customised HTML validation and strip tool that would work to very custom requirements. Consequently HTML Tidy wasn't exactly what we wanted and so I built a tool based on HTML::TreeBuilder.

The code runs fine but the output stage for printing has become quite monolithic (or at least if feels that way).

The code operates by first building the node tree and then processing it via walking (tramping?) over the tree and inserting/deleting nodes, adding/removing attributes and converting values (like colours) into what we want them to be. The second stage is to re-walk this tree and print it.

Its regarding this second stage that I've posted here. The page output has to have a very well formatted, clean output but the code that does this is quite complex. Currently, I have a recursive solution supported by lists that maintain a stack of ancestors to our current node.

So we call the routine with a node, pass a handful of if statements (creating output) determine if the node has children (recurse if we do), pass a few more if statements (creating output) and return. Needless to say that the if logic is getting increasingly complex and is getting less and less customisable.

Frankly I'm stumped as to how to refactor this. In fact I'm prepared to re-write the output stage from scratch but thought it best to get some advice first.

So does anyone have any ideas? I can post some code if required but I'm not sure how useful it would be.

Thanks in advance,

SP


In reply to Output of HTML tree built with TreeBuilder by simon.proctor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.