in reply to Re: is_start_tag not on tokeparser simple
in thread is_start_tag not on tokeparser simple

Thanks!

It executes without error now but it doesn't find any heading tags. I kept all other $htm5 as the same after I changed all instances to $tkn and having it error out again.

use warnings; use strict; use LWP::Simple; my $url = "my $url = "http://www.w3schools.com/html/html_primary.asp"; +"; # this has lots of <h#> tags my $src = get($url); my $headtags = ''; use HTML::TokeParser::Simple; my $htm5 = HTML::TokeParser::Simple->new(\$src); while ( my $tkn = $htm5->get_token ) { if ($tkn->is_start_tag( qr/^h[123456]$/ )) { next if (!$htm5->get_text); $headtags= $headtags . " " . $htm5->get_text; } } print "HEAD TAGS: $headtags\n\n\n\n"
Thanks.

Replies are listed 'Best First'.
Re^3: is_start_tag not on tokeparser simple
by revdiablo (Prior) on Mar 22, 2006 at 21:50 UTC

    Your first call to get_text (when you check if it has a value) is eating the text, then the next call doesn't get the value. You probably want something more along the lines of:

    my $text = $htm5->get_text; if ($text) { $headtags .= " $text"; }
      You were totally right! It works great now. Can you explain why when I test for the value it screws up the contents? I don't understand why it does that.

        When you call get_text, it returns the text segment, but it also advances the parser to the next token. So the next time you call it, it doesn't return the same thing. Thus you have to store its return value if you want to use it again.