thatguy has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a routine to verify links that are submitted to my webpage before posting them and I was wondering if there was a way to limit the amount of data that it retrieves? For example, if someone links directly to a 17MB avi, it tries to pull the whole avi in before it can verify it.

TIA, Jack.

sub check_url{ my $return; my $url=shift; chomp($url); #$url=~ s/^http:\/\///i; #$url="http://$url"; my $ua = LWP::UserAgent->new; $ua->agent("Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5 +.0)"); # Create a request my $req = HTTP::Request->new(GET => $url); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response unless ($res->is_success) { $return="404"; } return $return; }

Replies are listed 'Best First'.
•Re: Limiting the size of a HTTP::Request
by merlyn (Sage) on Mar 06, 2003 at 19:22 UTC
    From LWP::UserAgent:
    $ua = LWP::UserAgent->new( %options ); This class method constructs a new "LWP::UserAgent" object and returns a reference to it. Key/value pair arguments may be provided to set up the initial state of the user agent. The following options correspond to attribute methods described below: KEY DEFAULT ----------- -------------------- agent "libwww-perl/#.##" from undef timeout 180 use_eval 1 parse_head 1 max_size undef cookie_jar undef conn_cache undef protocols_allowed undef protocols_forbidden undef requests_redirectable ['GET', `HEAD'] The followings option are also accepted: If the "env_proxy" option is passed in an has a TRUE value, then proxy settings are read from environment vari- ables. If the "keep_alive" option is passed in, then a "LWP::ConnCache" is set up (see conn_cache() method below). The keep_alive value is a number and is passed on as the total_capacity for the connection cache. The "keep_alive" option also has the effect of loading and enabling the new experimental HTTP/1.1 protocol module.
    Notice the max_size parameter. There's also:
    $ua->max_size([$bytes]) Get/set the size limit for response content. The default is "undef", which means that there is no limit. If the returned response content is only par- tial, because the size limit was exceeded, then a "Client-Aborted" header will be added to the response.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Excellent! I appreciate your help.
Re: Limiting the size of a HTTP::Request
by hossman (Prior) on Mar 06, 2003 at 19:01 UTC
    From LWP::Simple ...
         head($url)
            Get document headers. Returns the following 5 values if
            successful:  ($content_type, $document_length,
            $modified_time, $expires, $server)
    
            Returns an empty list if it fails.  In scalar context
            returns TRUE if successful.
    
      I'm not using LWP::Simple to get the request. Is there a way to do that in HTTP::Request?

      I do understand that I could simply rewrite the code to use another method and I prolly will. It's academic now, I just want to know.

        The key point i was trying to make, is that if your motivation for limiting the size is that you only want to make sure the links are valid (or check the size of the remote content, or find out if it has moved, etc...) you may want to rethink your problem.

        A GET request is designed to return all of the data, where as a HEAD request is designed to only ask the server if the item exists, and get back basic meta-data about it. You could change one single line from your existing code (replace "GET" with "HEAD") and achieve yoru desired result. or you could eliminate most of your code alltogether, and re-use the existing functionality provided by LWP.

        (As merlyn pointed out, it is definitely possible to limit the response size you are willing to recieve ... but why make the remote server send you any data that you don't acctually want? All you want is the headers, so use a "HEAD" request and ask for only the headers.)

Re: Limiting the size of a HTTP::Request
by phydeauxarff (Priest) on Mar 06, 2003 at 19:08 UTC