Nygeve has asked for the wisdom of the Perl Monks concerning the following question:

Good day, Monks.
I'm trying to make link checker using LWP. Here's my "main" part:
use LWP; my $Browser = LWP::UserAgent->new; $Browser->agent("Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)"); $Browser->timeout(10); my $url = 'http://perlmonks.org'; my $Response = $Browser->head($url); if ($Response->is_success) { #OK } else { #BAD }
Everything is fine, until head request is working. For example this cool HP link can't be fetched with head request - server sends 500 (Internal Server Error). So, seems that i have to use get. But i don't want to make such traffic (there will be ~1500 of links that must be checked every day).
So far i found max_size($bytes) property of the LWP::UserAgent. But it's not working (whole page is loaded by perl).

Is there any other way (without direct usage of sockets)?

Replies are listed 'Best First'.
Re: Checking page existance
by Zaxo (Archbishop) on Nov 03, 2003 at 03:47 UTC

    Many sites refuse HEAD requests for no better reason than that they've seen it done that way or inherited it in a canned .htaccess file. Your only recourse is to record which sites don't respond and GET for just those.

    After Compline,
    Zaxo

Re: Checking page existance
by pg (Canon) on Nov 03, 2003 at 04:10 UTC

    In this case the best way is to use socket directly, and I don't know why you want to exclude it ;-) This is not to reinvent the wheel, as you want something that no existing module delivers directly (at least to your knowledge and mine).

    Any way, the easiest way is to use socket, and you don't need to waste much bandwidth. Even don't bother to use HEAD request, just send a GET request, reading the first several bytes back to get the response code, closing the socket, and going to check next link.

      I just thought that there is a more simple way and... i like LWP :-)
      Thanks, will use sockets.
•Re: Checking page existance
by merlyn (Sage) on Nov 03, 2003 at 12:04 UTC