in reply to Screen scraper

OK below is a copy of the code i have so far... I need to know if I am on the right track, and I need to know one last thing: how to extract the data from the site after opening its links and saving it as a .dat file on linux.
------------------------------------------------------------ #!/usr/bin/perl -w use strict; use LWP::Simple; my $index = shift; # ## assuring that the site still exists # my $base = "http://vortex.plymouth.edu/cgi-bin/gen_uacalplt-u.html"; die "Cloudn't get it!" unless define $content; print "Found:\n$content\n"; # ## fetching radiosonde data from web # my @hr = (00, 12) foreach my $hr (@hr) { my ($url) = $content =~ m{ http://vortex\.plymouth\.edu/cgi-bin/gen_uacalplt-u\.cgi?id=${inde +x}&pl=none&yy=05&mm=08&dd=24&hh=${hr}&pt=parcel&size=640x480 }smx; push (@urls, $url) if (defined ($url)); } print "URLs found: @urls\n"; ------------------------------------------------------------
As you can tell from the code: http://vortex.plymouth.edu/uacalplt-u.html (is the base site) From here you type in the data: KMIA (index for radiosonde data for Miami) Sounding data (text) (scroll down) 2005 (year) Aug (mo) 24 (day) 0z (hr) parcel 640x480 (size) to get the folling link: http://vortex.plymouth.edu/cgi-bin/gen_uacalplt-u.cgi?id=KMIA&pl=none&yy=05&mm=08&dd=24&hh=00&pt=parcel&size=640x480 My logic was to open up the base site first, and confirm that it still exists, hence the print. Then using the foreach (to do the 00z and 12z hours) and the command line for the index ($index = shift;) to open up this link. Now I would like to save all the data that is above "Sounding variables and indices" into a text file ( titled "$index_2005_237_$hr.dat". My question is how do I do that. I would greatly appreciate your help... Please email me back as soon as possible.

Replies are listed 'Best First'.
Re: screen scraper help
by petdance (Parson) on Jan 10, 2007 at 06:07 UTC
    You are doing what WWW::Mechanize was designed for. It fetches pages and parses them, so you can easily do something like:
    my $mech = WWW::Mechanize->new(); $mech->get( "http://somesite.com" ); my @links = $mech->links; for my $link ( $mech->links ) { # do whatever }
    WWW::Mechanize will be your friend. Trust me.

    xoxo,
    Andy

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: screen scraper help
by diotalevi (Canon) on Jan 10, 2007 at 06:05 UTC

    Read the documentation for the module you're already using.

    use LWP::Simple 'getstore'; getstore( $url, $filename );

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊