mellin has asked for the wisdom of the Perl Monks concerning the following question:

I just wrote a script to parse and find URIs from given text file and download those files to my local computer. The problem is that i would rather let my self choose to download or not if the remote file is bigger than i thought.

So, let's say there's a remote image file as big as 20MB and the uri to this file is http://www.boo.net/images/image.jpg. Now, before my script starts to download this file with LWP::simple, i want it to find out the file size and notify me if the size is, let's say, bigger than 2MB.

How can i determine remote file size over HTTP?
  • Comment on Determining file size over HTTP connection

Replies are listed 'Best First'.
Re: Determining file size over HTTP connection
by Joost (Canon) on May 15, 2005 at 19:03 UTC
    Most webservers support HEAD requests for most resources. The response to a HEAD request should be the headers for that resource without the content (response body). Most resources return a Content-length header.

    Note that this is not fool-proof, but it should work on most servers and urls that reference static files.

    If you have the lwp command line tools installed, you can do:

    > lwp-request -m HEAD http://example.com
    On the command line to see the servers' response.

    LWP::Simple has an head() function, which returns the content-length as the second return value.

Re: Determining file size over HTTP connection
by brian_d_foy (Abbot) on May 15, 2005 at 19:06 UTC

    For this task I wrote HTTP::Size.

    --
    brian d foy <brian@stonehenge.com>
Re: Determining file size over HTTP connection
by eibwen (Friar) on May 15, 2005 at 19:07 UTC

    Presuming the server you're downloading from is sending the proper headers, it should send a Content-length: header. From the LWP::Simple POD:

    head($url) Get document headers. Returns the following 5 values if successful +: ($content_type, $document_length, $modified_time, $expires, $serve +r) Returns an empty list if it fails. In scalar context returns TRUE +if successful.

    Therefore to make a determination based on filesize:

    $content_length = (LWP::Simple::head($url))[1];

Re: Determining file size over HTTP connection
by gaal (Parson) on May 15, 2005 at 19:09 UTC
    From RFC 2616, HTTP/1.1, section 14.13:

    The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET. [...] Applications SHOULD use this field to indicate the transfer-length +of the message-body, unless this is prohibited by the rules in section 4.4. [...] Note that the meaning of this field is significantly different from the corresponding definition in MIME, where it is an optional field used within the "message/external-body" content-type. In HTTP, it SHOULD be sent whenever the message's length can be determined prio +r to being transferred, unless this is prohibited by the rules in section 4.4.
    Check with your LWP docs to see how to access the header.