perleager has asked for the wisdom of the Perl Monks concerning the following question:
<tr><td class="earlyHeadline"><a href="newsArticle.jhtml?t +ype=businessNews&storyID=4511892§ion=news">SEC Targets More Fortu +ne 500 Names</a></td></tr> ...etc etc as each headline is displayed
#!/usr/bin/perl -w use strict; use HTML::TokeParser; use LWP::Simple; print "Content-type: text/html\n\n"; my $filename = 'temp.html'; open FH, ">$filename"; print FH get("http://www.reuters.com/newsEarlierArticles.jhtml?type=bu +sinessNews"); close FH; my $stream = HTML::TokeParser->new('$filename') || die "Couldn't read HTML file $filename: $!"; while(my $token = $stream->get_token) { if ($token->[0] eq 'S' and $token->[1] eq 'td' and ($token->[2]{'class'} || '') eq 'earlyHeadline') { my(@next) = ($stream->get_token); if ($next[0] and $next[0][0] eq 'S' and $next[0][1] eq 'a' and defi +ned $next[0][2]{'href'} ) { #early headline found for business section/grab a href portion print URI->new_abs($next[0][2]{'href'}, $filename), "\n"; next Token; } } }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: HTML::TokeParser help - parsing headlines
by Enlil (Parson) on Mar 07, 2004 at 01:41 UTC | |
Re: HTML::TokeParser help - parsing headlines
by Ovid (Cardinal) on Mar 07, 2004 at 04:21 UTC | |
by perleager (Pilgrim) on Mar 07, 2004 at 09:21 UTC | |
Re: HTML::TokeParser help - parsing headlines
by Popcorn Dave (Abbot) on Mar 07, 2004 at 01:26 UTC | |
by graff (Chancellor) on Mar 07, 2004 at 05:36 UTC | |
by Popcorn Dave (Abbot) on Mar 07, 2004 at 20:17 UTC | |
Re: HTML::TokeParser help - parsing headlines
by sheep (Chaplain) on Mar 07, 2004 at 01:50 UTC |