Re^3: More efficient use of HTML::TokeParser::Simple

Here's a trivial example that seems to do something like what you want and may be enough to get you started with TreeBuilder:

use warnings;
use strict;
use HTML::TreeBuilder;

my $html = do {local $/; <DATA>};
my $tree = HTML::TreeBuilder->new ();

$tree->parse ($html);
$tree->eof ();
$tree->elementify();

my ($title) = $tree->find ('title');
my @h1 = $tree->find ('h1');

print $title->as_text (), "\n";
print $_->as_text (), "\n" for @h1;

__DATA__
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- Took this out for IE6ites "http://www.w3.org/TR/REC-html40/loose.
+dtd" -->
<html lang="en">
  <head>
    <title>More efficient use of HTML::TokeParser::Simple perlquestion
+ id:560199</title>
  </head>
  <body>
  <h1>Header 1</h1>
  <p>First paragraph</p>
  <h1>Header 2</h1>
  <p>Second paragraph</p>
  <h2>Level 2 header 1</h2>
  </body>    
</html>
[download]

Prints:

More efficient use of HTML::TokeParser::Simple perlquestion id:560199
Header 1
Header 2
[download]

DWIM is Perl's answer to Gödel

Comment on Re^3: More efficient use of HTML::TokeParser::Simple Select or Download Code

Replies are listed 'Best First'.
Re^4: More efficient use of HTML::TokeParser::Simple by wfsp (Abbot) on Oct 30, 2008 at 16:07 UTC
What does `$tree->elementify();` [download] do here? It appears to run ok if it is commented out. I've often seen it in snippets and have no idea what purpose it serves.	[reply] [d/l]
Re^5: More efficient use of HTML::TokeParser::Simple by GrandFather (Saint) on Oct 30, 2008 at 19:52 UTC
The HTML::TreeBuilder documentation is a good place to start. It says that elementify (): This changes the class of the object in $root from HTML::TreeBuilder to the class used for all the rest of the elements in that tree (generally HTML::Element). Returns $root. and goes on to say: For most purposes, this is unnecessary, but if you call this after (after!!) you've finished building a tree, then it keeps you from accidentally trying to call anything but HTML::Element methods on it. ... Perl reduces RSI - it saves typing	[reply]