tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

I am completely puzzled by the behavior I am seeing when HTML::Treebuilder parses a simple html text string. Basically, it seems to be chopping the last word off of my html. Where I expect "some more text" I get just "some more."

The html is missing the beginning <head>/<body> stuff, and is therefore technically malformed, but I thought treebuilder was able to compensate for this in an intelligent way. Here's the code:

use strict; use warnings; use HTML::TreeBuilder; my $treeroot = HTML::TreeBuilder->new; #$treeroot->store_comments(1); my $whole_file = 'Some text. Some more text.'; $treeroot->parse( $whole_file ); #$treeroot->elementify(); # elementify doesn't matter either way. $treeroot->dump(); =output: <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "Some text. Some more" =cut
I am starting to doubt my sanity here. Anybody got any ideas?

Replies are listed 'Best First'.
Re: Why is HTML::Treebuilder chopping the last word off my html?
by borisz (Canon) on Oct 17, 2005 at 11:33 UTC
    You missed to call ->eof.
    use strict; use warnings; use HTML::TreeBuilder; my $treeroot = HTML::TreeBuilder->new; #$treeroot->store_comments(1); my $whole_file = 'Some text. Some more text.'; $treeroot->parse( $whole_file ); $treeroot->eof(); #$treeroot->elementify(); # elementify doesn't matter either way. $treeroot->dump();
    Boris
      Yep, that was the problem. Thanks borisz!