Ineffectual has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks, I have a script that uses WWW::Mechanize to check links on a page. One of those links happens to be to a very large PDF file. I want to check that the PDF is there, but I don't want to have my test download the entire PDF file because it takes a long time. My script currently does:
foreach my $link (@all_links) { my $link_url = $link->url; next unless ( $link_url =~ /http/ ); next if ( $link_url =~ /$url_root/ ); warn $link_url."\n"; my $out = $browser->get($link_url); #$success is 1 if a successful HTTP status code (2xx) +is returned my $success = $out->is_success; ok($success, "is_success"); #redirect is 1 if a redirection HTTP status code (3xx) + was returned #if a redirect is seen, may need to change link to a n +ew url my $redirect = $out->is_redirect; ok(!$redirect, "! is_redirect"); my $error=$out->is_error; #print errors if($error){ my $status = $out->error_as_HTML; warn "The error is:\n$status\n"; } }

Replies are listed 'Best First'.
Re: WWW::Mechanize and PDF files
by Fletch (Bishop) on Mar 12, 2010 at 21:15 UTC

    Another possibility: since a Mechanize instance isa LWP::UserAgent perhaps call the head method instead if the link URL matches /\.pdf\$/ and see if that's successful.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      ++

      This is a much better solution, since that's exactly what the HEAD request is for. I can't believe I forgot about it!

      Thanks very much, that works great! :)
Re: WWW::Mechanize and PDF files
by lostjimmy (Chaplain) on Mar 12, 2010 at 21:11 UTC
    I'm not particularly familiar with WWW:Mechanize, but you can always use the Range HTTP header to specify a small number of bytes: e.g. Range: bytes=0-9