Re^4: Web Scraping on CGI Scripts?

Hello Again

WWW::Mechanize does seem to be the right medicine but I've already hit a snag on the road; I'm only interested in following the 'motion.cgi' links and extracting these links as text documents however the regex I've used only finds the first 2 links? Any ideas on whats going on?


#!/usr/bin/perl
use strict;
use WWW::Mechanize;
use Storable;

my $mech_cgi = WWW::Mechanize->new;

$mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' );

my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi?
+/ );

for(my $i = 0; $i < @cgi_links; $i++) { 
    print "following link: ", $cgi_links[$i]->url, "\n";
    $mech_cgi->follow_link( url => $cgi_links[$i]->url )
      or die "Error following link ", $cgi_links[$i]->url;

}
[download]

best wishes

Dan

Comment on Re^4: Web Scraping on CGI Scripts? Download Code

Replies are listed 'Best First'.
Re^5: Web Scraping on CGI Scripts? by tospo (Hermit) on Oct 13, 2011 at 08:56 UTC
that's because after the first "follow_link" action, $mech_cgi is now on a different page (it behaves like a browser) and then you issue the next follow_link command but that links doesn't actually exist on the page you are on now. Add "$mech_cgi->back" before teh end of the loop and you will iterate through all the links.	[reply]