sivel has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script to check if a list of URLs exist. I have been using something like the following:
use LWP::Simple; my $url = 'http://www.domain.com/'; if (head($url)) { print "Does exist\n"; } else { print "Does not exist\n";; }
The above works very well for me except for one thing. If my internet connection drops for a minute or the site takes a really long time to respond my script hangs until it times out. I wanted to implement a shorter timeout but found that you can only do this with LWP::UserAgent.

Reading some documentation I can do this but I seem to be having problems verifying if the data I get back means the site exists. Also I need to be able to check http urls as well as ftp urls. I would prefer to use HEAD instead of GET as some of these pages are very large. I need to check somewhere on the order of 20 pages on about 200 URLs so this process takes a while and using a GET for all of these pages would just make it take all the much longer.

Can anyone provide me with some assistance and possible a sample script?

Thanks!

Replies are listed 'Best First'.
Re: Check if URL exists
by jhourcle (Prior) on May 31, 2007 at 17:09 UTC

    I don't know exactly what you're trying to do, but as you mention:

    somewhere on the order of 20 pages on about 200 URLs

    It sounds to me like you're just trying to watch pages that you're linking to, and checking to see if they still resolve. if that's the case, you might try some of the specialized scripts that exist for this sort of thing, such as linklint, or search for 'link checker' with your favorite internet search engine.

    ...

    As for other approaches ... if it's static pages, you can use GET but then set a current 'If-Modified-Since'. Otherwise, you can also start retrieving a document, then close the connection. (I don't know of any HTTP clients for Perl that do this ... you'd likely have to use IO::Socket and write your own).

Re: Check if URL exists
by varian (Chaplain) on Jun 01, 2007 at 06:44 UTC
    Sivel, From within LWP::Simple you can still use other LWP methods, just import the useragent object for this purpose.

    So in your example to set a timeout of 10:

    use LWP::Simple qw($ua head); $ua->timeout(10); my $url = 'http://www.domain.com/'; if (head($url)) { print "Does exist\n"; } else { print "Does not exist or timeout\n";; }

    NB: LWP::Simple's head() and get() functions do not distinguish between 'not exists' and 'timeout', it returns similar for both situations.

      This (importing the useragent object) worked perfectly. I wasn't sure I needed to know if it was a timeout or non-existent page...I decided I did and implemented an easy setting a variable for start time and one for end and doing subtraction.

      Thanks again!