in reply to Re^4: Split file, first 30 lines only
in thread Split file, first 30 lines only

Ranges are documented in section 14.35 of the HTTP RFC. They allow an HTTP client to request only part (or parts) of the resource which would ordinarily be retrieved in full (or in server-chosen chunks) from the server.

The RFC only mandates byte-count ranges so you should use that instead of lines in order to be portable. However if you are after the first 30 lines of a 50,000 line response then just pick a large enough byte range that you will likely retrieve at least your 30 lines and if fewer lines are returned you can issue subsequent requests until you have all the data you require.

  • Comment on Re^5: Split file, first 30 lines only (HTTP Ranges)

Replies are listed 'Best First'.
Re^6: Split file, first 30 lines only (HTTP Ranges and :read_size_hint)
by Discipulus (Canon) on Mar 02, 2017 at 10:28 UTC
    is not what :read_size_hint => $bytes of LWP::UserAgent is for?

    or in other words: is :read_size_hint the implementation of the HTTP ranges you are talking about?

    If i remember the hint word is there because there is no guarantee that the chunk retrieved will be exactly $bytes long: it is merely a hint, which LWP may disregard.

    Even with such recomendation i remember i read somewhere, the following example seems to demonstrate that data is retrieved exactly by chunks of desired length, even for bizarre values of $bytes

    Obviosly the last chunk will be of arbitrary lenght.

    thanks

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      is :read_size_hint the implementation of the HTTP ranges you are talking about?

      It is one implementation of it but requires careful use of the callback. As you can see from your code, it downloads all the content but in chunks of your specified size. Since the object here is rather only to download the minumum amount of data from the server, the callback must die to stop the subsequent chunks being retrieved. eg:

      #!/usr/bin/env perl use strict; use warnings; use utf8; use strict; use warnings; use LWP::UserAgent; # Modify these three variables only to suit my $url = 'http://www.gutenberg.org/ebooks/1533.txt.utf-8'; # M +acBeth my $wantlines = 30; # Retrieve this number of lines my $bytes = 256; # Chunk size to download my $firstndata; my $linecount = 0; my $chunkcount = 0; sub add_chunk { my ($chunk, $res, $proto) = @_; $firstndata .= $chunk; $linecount += () = $chunk =~ /\n/g; $chunkcount++; die if $linecount >= $wantlines; } my $ua = LWP::UserAgent->new; my $res = $ua->get ($url, ':content_cb' => \&add_chunk, ':read_size_hi +nt' => $bytes); print "Retrieved $linecount lines in $chunkcount chunks from $url:\n\n +$firstndata\n";

      If you run this, you will see that it retrieves slightly more than the 30 lines required, but substantially less than the full text. This seems like a reasonable compromise and is, of course, tunable by the user to the specific task at hand by varying $wantlines and $bytes.

      thank you Discipulus!
Re^6: Split file, first 30 lines only (HTTP Ranges)
by wrkrbeee (Scribe) on Mar 02, 2017 at 16:07 UTC
    Thank hippo!!