in reply to HTML::TokeParser help - parsing headlines
If you switch to HTML::TokeParser::Simple, I think you'll be happy with how much clearer the logic is.
use strict; use HTML::TokeParser::Simple; use LWP::Simple; use URI; my $url = 'http://www.reuters.com/newsEarlierArticles.jhtml?type=busin +essNews'; my $stream = HTML::TokeParser::Simple->new(\get($url)) || die "Couldn't read $url: $!"; while(my $token = $stream->get_token) { next unless $token->is_start_tag('td') and ($token->return_attr('class') || '') eq 'earlyHeadline'; my $next = $stream->get_token; if ($next->is_start_tag('a')) { print URI->new_abs($next->return_attr('href'), $url), "\n"; } }
Cheers,
Ovid
New address of my CGI Course.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Re: HTML::TokeParser help - parsing headlines
by perleager (Pilgrim) on Mar 07, 2004 at 09:21 UTC |