in reply to Re^2: More efficient use of HTML::TokeParser::Simple
in thread More efficient use of HTML::TokeParser::Simple

Here's a trivial example that seems to do something like what you want and may be enough to get you started with TreeBuilder:

use warnings; use strict; use HTML::TreeBuilder; my $html = do {local $/; <DATA>}; my $tree = HTML::TreeBuilder->new (); $tree->parse ($html); $tree->eof (); $tree->elementify(); my ($title) = $tree->find ('title'); my @h1 = $tree->find ('h1'); print $title->as_text (), "\n"; print $_->as_text (), "\n" for @h1; __DATA__ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!-- Took this out for IE6ites "http://www.w3.org/TR/REC-html40/loose. +dtd" --> <html lang="en"> <head> <title>More efficient use of HTML::TokeParser::Simple perlquestion + id:560199</title> </head> <body> <h1>Header 1</h1> <p>First paragraph</p> <h1>Header 2</h1> <p>Second paragraph</p> <h2>Level 2 header 1</h2> </body> </html>

Prints:

More efficient use of HTML::TokeParser::Simple perlquestion id:560199 Header 1 Header 2

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^4: More efficient use of HTML::TokeParser::Simple
by wfsp (Abbot) on Oct 30, 2008 at 16:07 UTC
    What does
    $tree->elementify();
    do here? It appears to run ok if it is commented out. I've often seen it in snippets and have no idea what purpose it serves.

      The HTML::TreeBuilder documentation is a good place to start. It says that elementify ():

      This changes the class of the object in $root from HTML::TreeBuilder to the class used for all the rest of the elements in that tree (generally HTML::Element). Returns $root.

      and goes on to say:

      For most purposes, this is unnecessary, but if you call this after (after!!) you've finished building a tree, then it keeps you from accidentally trying to call anything but HTML::Element methods on it. ...

      Perl reduces RSI - it saves typing