schalker has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, I want to (legally) download MP3s via HTTP. I use an LWP user agent and get it with the request method. But then I want to check if the complete file has been downloaded. So I do
$ua = LWP::UserAgent->new; # [..] cookies setting skipped $request = HTTP::Request->new("HEAD"); $request->url("http://www.emusic.com/whatever.mp3"); # replace with real MP3 file $response = $ua->request($request); print "dooh" if !defined($response->headers->content_length);
but unfortunately the content-length header is missing in this particular example. So my question: is there any other way of checking the size of a remote file via HTTP? Am I missing something? TIA!

Replies are listed 'Best First'.
Re: missing content-length
by arhuman (Vicar) on Jan 22, 2001 at 20:53 UTC
    It's hard to answer as I can't reproduce (ok, ok it's also beccause I'm just an initiate... ;-)
    It just works fine for me: I got the content-length
    Does it always fail ? or just on one file ? or (apparently) randomly ?
    Is there a possibility that you mis-type the filename or something like that ?
    (Don't feel offended!)

    Anyway I've slightly modified your code (to check other things (LWP debug code+ the full header to see what was returned...) What's the result when you execute this:
    (replace the filename)
    require LWP::UserAgent; use LWP::Debug('+'); $ua = new LWP::UserAgent; $request = new HTTP::Request('HEAD', 'http://content.emusic.com/demo/542259/Guitar_Slim-Sufferin_Mind-01-Th +e_Things_That_I_Used_To_Do-demo.mp3'); $response = $ua->request($request); my %headers=%{$response->headers}; print "dooh" if !defined($response->headers->content_length); for my $value (keys %headers) { print "($value)=($headers{$value})\n"; + }
      Thanks for the reply. Your code works fine with the free demo MP3 file you checked. But it doesn't work with MP3s that can only be accessed with an emusic.com account. But thanks to your code, I now know the reason:

      http://content.emusic.com/downloads/472279/Sean_Deason-Allegory_Metaphor-01-Creation.mp3

      gives me the following LWP error message:

      LWP::Protocol::http::request: HTTP/1.1 403 Forbidden
      LWP::UserAgent::request: Simple response: Forbidden

      So I guess my question is: how to get the size of a restricted file, where an HTTP request for a header does not work? Am I still in the right place to ask this question?

Re: missing content-length
by mr.nick (Chaplain) on Jan 22, 2001 at 21:06 UTC
    There are times where the web server can't determine the length of the content, such as with a script. Additionally, there are times when you would think the file being returned is static enough (like a download of an MP3), but in reality there is a script behind the scenes that's actually bouncing the contents to you (such as using sessions ID's or such) such that the server itself never knows EXACTLY what the file is, and therefore can't stat() the file to determine it's length.

    I hate to give answers like this, but: tough. Even download managers (like GoZilla and Download Accelerator) can't always determine the filesize.

      Thanks for your reply: I guess I know what you are talking about, I'm just wondering why Netscape knows the filesize when I manually download an emusic MP3 (can be seen in the progress bar).

      Anyway, my current workaround is the following: emusic samples with 128Kb/sec. Furthermore, the expected length of a song is given in min:sec on the refferring page. So I extract the duration of each song, calculate the expected file size and check if my downloaded file fits approximately within this range. It's a hack and only recognizes gross errors.

Re: missing content-length
by Rune (Initiate) on Jan 26, 2001 at 18:12 UTC
    It could be because the emusic webserver specifically doesn't allow the HEAD request, but does pass a Content-Length on a GET. That would explain why Netscape knows the length of the mp3s. If that is the case, the $response->headers will know the Content-Length after a GET.

    I've seen several webservers do this, possibly to make it more difficult to crawl their sites. I have neither an emusic account nor LWP installed here, but I remember trying to make a crawler and running into this problem.

    But, as mentioned in another writeup, if the file is being passed by a lame script, there might not be any Content-Length field at all.

    If you want to know the size while downloading, you should probably use a callback when doing the request and pass the $response object to the callback. (And then have the callback die if the size isn't acceptable, I guess).

    But even if you do get a Content-Length from either HEAD or GET, some webservers may return a wrong Content-Length or 0 as most browsers apparently only use this for displaying some information.