fraizerangus has asked for the wisdom of the Perl Monks concerning the following question:
The website I am trying to scrape from has some links which can't be followed because of a server issue, when I iterate through the links on the page the prgram crashes because of these 'down links'. What is the best way of testing these links first before I follow them and extract the URL?
An example of teh doen link is as follows: http://www.molmovdb.org/cgi-bin/motion.cgi?ID=ppar
Would I need Test::WWW::Mechanize to test before following? Also is it possible to iterate the get($url) in a loop for the links, everything I've tried so far does'nt allow me to do so, it wants an absolute URL?
Using the following code:
#!/usr/bin/perl use strict; use WWW::Mechanize; use Storable; my $mech_cgi = WWW::Mechanize->new; $mech_cgi->get( 'http://www.molmovdb.org/cgi-bin/browse.cgi' ); my @cgi_links = $mech_cgi->find_all_links( url_regex => qr/motion.cgi/ + ); for(my $i = 0; $i < @cgi_links; $i++) { print "following link: ", $cgi_links[$i]->url, "\n"; $mech_cgi->follow_link( url => $cgi_links[$i]->url ) or die "Error following link ", $cgi_links[$i]->url; $mech_cgi->back; }
many thanks and best wishes
Dan
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: conditional testing for error 500 webpages before following them?
by Perlbotics (Archbishop) on Oct 15, 2011 at 14:19 UTC | |
|
Re: conditional testing for error 500 webpages before following them?
by roboticus (Chancellor) on Oct 15, 2011 at 14:09 UTC | |
|
Re: conditional testing for error 500 webpages before following them?
by Marshall (Canon) on Oct 15, 2011 at 16:18 UTC | |
|
Re: conditional testing for error 500 webpages before following them?
by Anonymous Monk on Oct 15, 2011 at 14:47 UTC |