Why is HTML::Treebuilder chopping the last word off my html?

tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

I am completely puzzled by the behavior I am seeing when HTML::Treebuilder parses a simple html text string. Basically, it seems to be chopping the last word off of my html. Where I expect "some more text" I get just "some more."

The html is missing the beginning <head>/<body> stuff, and is therefore technically malformed, but I thought treebuilder was able to compensate for this in an intelligent way. Here's the code:

use strict;
use warnings;
use HTML::TreeBuilder;

my $treeroot = HTML::TreeBuilder->new;
#$treeroot->store_comments(1);

my $whole_file = 'Some text. Some more text.';
$treeroot->parse( $whole_file );
#$treeroot->elementify(); # elementify doesn't matter either way.
$treeroot->dump();

=output: 
<html> @0 (IMPLICIT)
  <head> @0.0 (IMPLICIT)
  <body> @0.1 (IMPLICIT)
    "Some text. Some more"
=cut
[download]

I am starting to doubt my sanity here. Anybody got any ideas?

Comment on Why is HTML::Treebuilder chopping the last word off my html? Download Code

Replies are listed 'Best First'.
Re: Why is HTML::Treebuilder chopping the last word off my html? by borisz (Canon) on Oct 17, 2005 at 11:33 UTC
You missed to call ->eof. `use strict; use warnings; use HTML::TreeBuilder; my $treeroot = HTML::TreeBuilder->new; #$treeroot->store_comments(1); my $whole_file = 'Some text. Some more text.'; $treeroot->parse( $whole_file ); $treeroot->eof(); #$treeroot->elementify(); # elementify doesn't matter either way. $treeroot->dump();` [download] Boris	[reply] [d/l]
Re^2: Why is HTML::Treebuilder chopping the last word off my html? by tphyahoo (Vicar) on Oct 17, 2005 at 11:42 UTC
Yep, that was the problem. Thanks borisz!	[reply]