The website I am trying to scrape from has some links which can't be followed because of a server issue, when I iterate through the links on the page the prgram crashes because of these 'down links'. What is the best way of testing these links first before I follow them and extract the URL?
An example of teh doen link is as follows: http://www.molmovdb.org/cgi-bin/motion.cgi?ID=ppar
Would I need Test::WWW::Mechanize to test before following? Also is it possible to iterate the get($url) in a loop for the links, everything I've tried so far does'nt allow me to do so, it wants an absolute URL?
Using the following code:
#!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ + ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; $mech_cgi->back; }
many thanks and best wishes
Dan
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |