Here is the code. Its a 254 line function :).

TAG - this is an object that contains the rules for controlling whether we can indent or not. It also provides one or two other convenience methods.

$mod - if it isn't obvious, this controls our indentation level so the more child tags we have the more we indent (using tabs).
sub prepareOutput { my $self = shift; my $node = shift; unless(defined($node)) { warn "No parsable node or tree passed."; return; } # Add the current node tag to the stack $current_tag = $node->tag; unshift(@{$old_tag},$current_tag); # See if we are a frameset if($node->tag eq 'frameset') { $self->[FRAMESET]++; } if(DEBUG) { $self->[OUTPUT] .= $old_parent->[0] if defined($old_parent->[0 +]); $self->[OUTPUT] .= $old_tag->[0] if defined($old_tag->[0]); } # If we are a child if(defined($node->parent)) { # Store the parent in the parent tag - we do this incase # we meet some text and we don't have the parent call availabl +e $current_parent = $node->parent->tag; unshift(@{$old_parent},$current_parent); # Check that the parent allows us to be indented, accomodating # for comments. if( $current_parent eq 'td' && $node->parent->content_list < 1 && $node->tag eq 'a' || $node->tag eq 'img' ) { $self->[OUTPUT] .= '<' . $node->tag; } elsif($self->[TAG]->canIndentChild($current_parent)) { if($node->tag eq '~comment') { $self->[OUTPUT] .= "\t" x $mod; $self->[OUTPUT] .= '<!--'; } else { $self->[OUTPUT] .= "\t" x $mod; $self->[OUTPUT] .= '<' . $node->tag; } } else { if($node->tag eq '~comment') { $self->[OUTPUT] .= '<!--'; } else { $self->[OUTPUT] .= '<' . $node->tag; } } } else { # No parent, print if($node->tag eq '~comment') { $self->[OUTPUT] .= "\t" x $mod; $self->[OUTPUT] .= '<!--'; } else { $self->[OUTPUT] .= "\t" x $mod; $self->[OUTPUT] .= '<' . $node->tag; } } # Print the attributes for this tag, also takes care of space pres +erving # for XHTML $self->printAttributes($node); # Determine how we end this tag, ie if it is a terminal like <br> +or <hr> my $is_terminal = 0; if($self->[TAG]->isTerminal($node->tag)) { if($node->tag eq '~comment') { $self->[OUTPUT] .= ' --'; } else { $self->[OUTPUT] .= ' /' if $self->[CONFIG]->setting('xhtml +_output') eq 'yes'; } $is_terminal = 1; } # This accomodates whether we indent future nodes or keep the 'tig +ht' my @list = $node->content_list(); #print $list[0],"\n"; if( defined($current_parent) && $current_parent eq 'td' && $node->parent->content_list < 1 && $node->tag eq 'a' || $node->tag eq 'img' ) { $self->[OUTPUT] .= ">"; } elsif( $node->tag eq 'td' && $node->content_list == 1 && defined($list[0]) && ref($list[0]) && $list[0]->tag eq 'img' ) { $self->[OUTPUT] .= '>'; } elsif( $node->tag eq 'td' && $node->content_list > 1 && defined($list[0]) && ref($list[0]) && $list[0]->tag eq 'img' ) { $self->[OUTPUT] .= ">\n" . ("\t" x ($mod + 1)); } elsif($self->[TAG]->canIndentChild($old_parent->[0]) && ($self->[T +AG]->canIndentChild($old_tag->[0]))) { $self->[OUTPUT] .= ">\n"; } else { $self->[OUTPUT] .= ">"; } # Now traverse child nodes via recursion or, if the next bit is te +xt, # print that out. $mod++; my @nodes = $node->content_list; my $n_count = @nodes; # foreach my $c ($node->content_list) my $c; for(my $i = 0; $i < $n_count; $i++) { $c = $nodes[$i]; next unless defined($c); if(ref $c) { if($i > 0 && (ref($nodes[$i - 1]) && $nodes[$i - 1]->tag e +q 'a')) { $self->[OUTPUT] .= "\n"; } $self->prepareOutput($c); } elsif($i > 0 && (ref($nodes[$i - 1]) && $nodes[$i - 1]->tag eq + 'a')) { # Its an anchor followed by some punctuation. We need the +pnuctuation # on the same line for IE. if($c =~ /^[.,);?\:]/) { $self->[OUTPUT] .= substr($c, 0, 1); $c = substr($c, 1, length($c)); } $self->[OUTPUT] .= "\n"; $self->printNodeText($c); } else { $self->printNodeText($c); } } $mod--; # Just in case we had an anchor at the end and need to indent if(ref $c && $c->tag eq 'a' && $node->tag =~ /p|td/) { $self->[OUTPUT] .= "\n"; } # If our tag was not a terminal, we must $self->[OUTPUT] .= the en +d tag, taking care # to ensure we have followed our indentation rules. if(!$is_terminal) { if(DEBUG) { $self->[OUTPUT] .= $old_parent->[0] if defined($old_parent +->[0]); $self->[OUTPUT] .= $old_tag->[0] if defined($old_tag->[0]) +; } if($node->tag eq 'a') { $self->[OUTPUT] .= '</a>'; } elsif( $node->tag eq 'td' && $node->content_list < 1 ) { $self->[OUTPUT] .= '</' . $node->tag . ">\n"; } elsif( $node->tag eq 'td' && $node->content_list == 1 && defined($list[0]) && ref($list[0]) && $list[0]->tag eq 'img' ) { $self->[OUTPUT] .= '</' . $node->tag . ">\n"; } elsif($self->[TAG]->canIndentChild($old_tag->[0])) { ($node->tag eq '~comment') ? $self->[OUTPUT] .= "\t" x $mo +d . " -->\n" : $self->[OUTPUT] .= "\t" x $mod . '</' . $node->tag +. ">\n"; } else { if($self->[TAG]->canIndentChild($old_parent->[0])) { ($node->tag eq '~comment') ? $self->[OUTPUT] .= " -->\ +n": $self->[OUTPUT] .= '</' . $node->tag . ">\n"; } else { ($node->tag eq '~comment') ? $self->[OUTPUT] .= " -->" +: $self->[OUTPUT] .= '</' . $node->tag . ">"; } } } # Remove our current bits off the stacks and return. $current_parent = shift( @{$old_parent}); $current_tag = shift( @{$old_tag}); }

In reply to Output stage code (long) by simon.proctor
in thread Output of HTML tree built with TreeBuilder by simon.proctor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.