Quick Yahoo Top Stories Parser

A quick and dirty script that will suck down the Top Stories from Yahoo's page and spit them into an

field. That's about it. Useful for cron or sucking in via SSI pages. (Do they have an RDF/RSS anywhere?)

#!/usr/bin/perl
use strict;
use LWP::Simple;

# declare our variables for safety
my ($sawFCEnd, $sawBR, $link, $title);

# suck down the webpage
my $content = get("http://dailynews.yahoo.com/htx/ts/");

# split the content on each new line, and loop through the lines
foreach(split(/\n/, $content)) {

   # if we see FCEnd, we're getting close
   if (/<!-- FCEnd -->/) { $sawFCEnd=1; next; };

   # if we see two <br>'s on a line, we're even closer
   if ($sawFCEnd && /<br><br>/) { $sawBR=1; next; };
   
   # if we saw the two <br>'s, this must me our news
   # link, OR if the line starts with <a href=... we 
   # must have a news link too... 
   if ($sawBR or /^<a href=/) { 
      
      # clear the variables for the next loop
      $sawFCEnd=0; $sawBR=0;
     
      # grep the link and title into variables
      ($link, $title) = (/<a href="(.*)"><b>(.*)<\/b><\/a>/);

      # print a line only there's a $link
      print "<LI><A HREF=\"$link\">$title</A>\n" if defined($link);
   }

   # start over again
   next;
}
[download]

Comment on Quick Yahoo Top Stories Parser Download Code

Replies are listed 'Best First'.
RE: Quick Yahoo Top Stories Parser by morbus (Sexton) on Jul 20, 2000 at 00:35 UTC
There's a small bug in this... Where I check for multiple BRs, it should be: `/^<br><br>/` [download] Minor change, but important for getting all the stories.	[reply] [d/l]
RE: Quick Yahoo Top Stories Parser by morbus (Sexton) on Jul 19, 2000 at 20:27 UTC
Fraggle baggle. This is my post. Forgot to log in first.	[reply]