in reply to X(ht)ML Source Formatting

Try the HTML::Element as_HTML() method. The example from the docs give typical usage like so:

# syntax:
# $h->as_HTML($entities, $indent_char, \%optional_end_tags)
print $h->as_HTML('<>', '  ', {});

Output is indented by specifying $indent_char. In this case the HTML is indented with two spaces.

ko

Replies are listed 'Best First'.
Re: Re: X(ht)ML Source Formatting
by ViceRaid (Chaplain) on Aug 15, 2003 at 13:00 UTC

    Yeah, the as_HTML method of HTML::Element does produce nicely formatted output, but the output is HTML rather than XHTML (which is HTML expressed in XML). Take a look at the W3C MarkUp pages for details of the differences. It's things like having to use lower-case, quote attributes, close tags, like:

    <img src="pic.gif" alt="nice picture" />

    closing the tags, instead of:

    <IMG src=pic.gif alt="nice picture">

    which is acceptable HTML, but not XHTML.

    HTH
    ViceRaid

      Sorry, saw that you were using HTML::TreeBuilder, and assumed you were mainly concerned with indenting.

      Actually, the modules take care of most of what you want including:

      1. makes sure there are no improperly nested elements
      2. automatically lowercasing element and attribute names.
      3. closes all tags, if you pass is an empty hashref to the as_HTML() method (\%optional_end_tags).
      4. quotes attributes

      But you still have to deal with closing empty elements like <br> which you could do fix like this (you'll have to play around with trying to fix <img> and others):

      use strict;
      use HTML::TreeBuilder;
      
      my $root = HTML::TreeBuilder->new;
      my $html = $root->parse_file('a.htm');
      my @br = $html->look_down('_tag','br');
      my  $literal = HTML::Element->new('~literal','text' => '<br />');
      foreach (@br) {
        $_->replace_with($literal)->delete;
      }
      print $html->as_HTML('<>', ' ',{});
      

      The line with $literal is kind of a kludge, I don't know if it will break the tree (shouldn't because these types of elements should be empty...

      HTH - ko