in reply to HTML::TreeBuilder bug or feature?

Update: Saw GrandFather's reply. There is some misunderstanding. I guess that that was my fault, and I could have expressed myself more clearly. I have never doubted the merit of his code. The intention of this reply was/is not about whether his orginal HTML was valid. What I am saying is that, as_html converts a valid document to an invalid document, and that is a bug. To prove that, I needed a valid HTML document (in a more strict sense) that can pass the validation service, and that's all why I modified the original code. Otherwise, both HTML's before and after convertion fail the validation, and I cannot prove my point. No worries, GrandFather ;-)

=========================================

This is a bug, by HTML 4.01 specification. You do not need to be familiar with the specification, we can use W3C validate service to verify those HTML documents in this reply.

I modified your code a little bit to contain a valid HTML document. The HTML document passed W3C validation as tentatively valid.

use strict; use warnings; use HTML::TreeBuilder; my $data = do {local $/ = ""; <DATA>}; my $tree = HTML::TreeBuilder->new; $tree->store_comments(1); $tree->store_declarations(1); $tree->parse ($data); $tree->eof (); print $tree->as_HTML(); __DATA__ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML> <HEAD> <TITLE>My first HTML document</TITLE> </HEAD> <BODY> <P>Hello world! </BODY> </HTML>

Run this program it generates:

<html><head><title>My first HTML document</title></head><body><p>Hello + world! </body><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"></html>

And this generated HTML does not pass the validate service. It complains that DOCTYPE cannot be found and is misplaced.

Replies are listed 'Best First'.
Re^2: HTML::TreeBuilder bug or feature?
by GrandFather (Saint) on Sep 07, 2005 at 04:20 UTC

    I trimmed the !DOCTYPE tag contents down because I didn't need more than that to demonstrate the problem, which is that the declaration tag and the comment tag move to after the body when the HTML is generated. Actually, that is the way it is stored in $tree. Swapping the two problematic tag entries in $tree->{_content} to the start of the array "fixes" the problem.

    The sample "HTML" is not intended to be valid beyond the extent needed to demonstrate the problem.


    Perl is Huffman encoded by design.

      I guess that it was my fault. Please see my update above. I understand perfectly that we all cherish our program and don't want others to modify it, as if there was something wrong with it.

      I really don't want you to take it in the wrong way, and I am sorry if I made you felt bad, although it was not my intention. My intention was definitely not what you thought it was, and I was not commenting your code. I was merely try to prove that it was a bug, but from a HTML specification point of view.

      Put in this way, had the coverted HTML with DOCTYPE at its end also passed the validation, as did the original one, I would have probably said that it was not a bug.

        Please accept my appology for misunderstanding your reply. I thought you had misunderstood the nature of the bug I was highlighting and that you had latched on to something spurious. You had not and I now understand what you were doing.

        Cheers, GrandFather


        Perl is Huffman encoded by design.