I've run into an interesting problem while testing a new piece of code I'm writing.

During testing, I pointed my code at various dozens of websites; static content, dynamic content, images, pdfs, etc. and it all worked great. I was checking the remote end's Content-Type header and their Content-Length header using HEAD, to see if I should fetch it or not.

Basically if the size reported in Content-Length was too large, I'd ignore the fetch.

my $req = HTTP::Request->new(HEAD => $url); my $resp = $ua->request($req); my $type = $resp->header('Content-Type'); my $content = $resp->content; my $content_len = $resp->header('Content-Length');

This was working great, until I realized that a lot of servers don't send a Content-Length header. DOH! Even sites serving static, flat text or html content, are not sending a Content-Length header.

In the above snippet, I'm using HEAD, so as to avoid using a GET request on larger files, and then ignore the processing of them after I'd already fetched them.

So I started trying to figure out a way to determine the length of the remote content, without actually fetching the content itself, and this is where I'm stuck.

I could do this:

my $req = HTTP::Request->new(GET => $url); my $content = $resp->content; my $content_len = length($content);

But now I'm doing a GET, and if someone decides to point that to a 20-gigabyte file, or a DVD iso or something like that, it'll drown my bandwidth, and DDoS my tool for other users.

Is there some other way to do this, without doing a full fetch of the remote resource?

Update: This sort-of works, but for sites without a Content-Length header, I do a double-hit, HEAD first, then GET second. Is there a better way?

my $req = HTTP::Request->new(HEAD => $pl_url); my $resp = $ua->request($req); my $type = $resp->header('Content-Type'); my $status_line = $resp->status_line; my ($content, $content_len); if ($resp->header('Content-Length')) { $content_len = $resp->header('Content-Length'); } else { $req = HTTP::Request->new(GET => $pl_url); $resp = $ua->request($req); $content = $resp->content; $content_len = length($content); }

In reply to Determining Content-Length when there is no Content-Length header by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.