Re: Sneeky Snake

Or, a lot shorter (but not any less RAM hungry)...

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $page = get ("http://setiathome.ssl.berkeley.edu/stats/country_7.ht
+ml");

my @lines = split /<tr>/, $page;

for (@lines) {
    # Take out the links (for the lines that have 'em)
    s/<a.*?>(.*)<\/a>/$1/g;
    # Take out the silly &nbsp;
    s/&nbsp;//g;
    # Match the 2 parts you want
    m/<td>(.*?)<\/td>.*?(\d+)/isg;
    # And print it
    print "$1 : $2\n";
}
[download]

Works fine, although it's definitely in the "one shot" category... any major changes to the web page format will break this program. Although using HTML::TableExtract is a better overall solution, throwing a hack like this together only takes a few minutes. It's an (easy) example of the general idea of loading in a web page and sucking out the bits that you're interested in.

Gary Blackburn
Trained Killer

Comment on Re: Sneeky Snake Download Code