Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

On the docs for tokeparser::simple it says there is a is_start_tag and there's an example of it using it to collect all types of heading tags. The great thing is, that's exactly what I'm trying to collect.

This code errors out with "can't located object method 'is_start_tag' via package HTML::TokeParser::Simple"

my $headtags = ''; use HTML::TokeParser::Simple; my $htm5 = HTML::TokeParser::Simple->new( \$src); while ( my $tkn = $htm5->get_token ) { if ($htm5->is_start_tag( qr/^h[123456]$/ )) { next if (!$htm5->get_text); $headtags= $headtags . " " . $htm5->get_text; } } print "HEADing TAGS: $headtags\n\n\n\n"

Replies are listed 'Best First'.
Re: is_start_tag not on tokeparser simple
by Corion (Patriarch) on Mar 22, 2006 at 17:17 UTC

    The HTML::TokeParser::Simple documentation states that the ->is_start_tag method should be called on the token, not on the parser object:

    if ( $token->is_start_tag( qr/^h[123456]$/ ) ) { ... }

    So your code should be using $tkn where it uses $htm5.

      Thanks!

      It executes without error now but it doesn't find any heading tags. I kept all other $htm5 as the same after I changed all instances to $tkn and having it error out again.

      use warnings; use strict; use LWP::Simple; my $url = "my $url = "http://www.w3schools.com/html/html_primary.asp"; +"; # this has lots of <h#> tags my $src = get($url); my $headtags = ''; use HTML::TokeParser::Simple; my $htm5 = HTML::TokeParser::Simple->new(\$src); while ( my $tkn = $htm5->get_token ) { if ($tkn->is_start_tag( qr/^h[123456]$/ )) { next if (!$htm5->get_text); $headtags= $headtags . " " . $htm5->get_text; } } print "HEAD TAGS: $headtags\n\n\n\n"
      Thanks.

        Your first call to get_text (when you check if it has a value) is eating the text, then the next call doesn't get the value. You probably want something more along the lines of:

        my $text = $htm5->get_text; if ($text) { $headtags .= " $text"; }