in reply to Re: X(ht)ML Source Formatting
in thread X(ht)ML Source Formatting

Yeah, the as_HTML method of HTML::Element does produce nicely formatted output, but the output is HTML rather than XHTML (which is HTML expressed in XML). Take a look at the W3C MarkUp pages for details of the differences. It's things like having to use lower-case, quote attributes, close tags, like:

<img src="pic.gif" alt="nice picture" />

closing the tags, instead of:

<IMG src=pic.gif alt="nice picture">

which is acceptable HTML, but not XHTML.

HTH
ViceRaid

Replies are listed 'Best First'.
Re: Re: Re: X(ht)ML Source Formatting
by koku (Initiate) on Aug 18, 2003 at 13:21 UTC

    Sorry, saw that you were using HTML::TreeBuilder, and assumed you were mainly concerned with indenting.

    Actually, the modules take care of most of what you want including:

    1. makes sure there are no improperly nested elements
    2. automatically lowercasing element and attribute names.
    3. closes all tags, if you pass is an empty hashref to the as_HTML() method (\%optional_end_tags).
    4. quotes attributes

    But you still have to deal with closing empty elements like <br> which you could do fix like this (you'll have to play around with trying to fix <img> and others):

    use strict;
    use HTML::TreeBuilder;
    
    my $root = HTML::TreeBuilder->new;
    my $html = $root->parse_file('a.htm');
    my @br = $html->look_down('_tag','br');
    my  $literal = HTML::Element->new('~literal','text' => '<br />');
    foreach (@br) {
      $_->replace_with($literal)->delete;
    }
    print $html->as_HTML('<>', ' ',{});
    

    The line with $literal is kind of a kludge, I don't know if it will break the tree (shouldn't because these types of elements should be empty...

    HTH - ko