How to read just part of a url's content

no_germs has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to read just part of a url's content (byte ranges) by tye (Sage) on May 29, 2007 at 15:41 UTC
Add a "~~Content-~~Range: bytes=0-512/" header to your request if what you are wanting is to just get some of the bytes of the response (at least if the response previously included the "Accept-Ranges: bytes" header). Google can tell you more. Update:* Thanks to jettero for finding the right header. I've done this in the past but my quick look for the specifics ran into the wrong header at first. - tye	[reply]
Re: How to read just part of a url's content by kyle (Abbot) on May 29, 2007 at 15:30 UTC
If you use LWP::UserAgent, it looks like you can pass the `:content_cb` option to `get` and deal with the response a little at a time. If your callback calls die, the request is aborted, so you can quit reading once you've found what you want. Update with code and output: use LWP::UserAgent; sub little_bit { my ( $content, $response, $protocol ) = @_; printf "chunk length %d\n", length $content; if ( $content =~ /the/i ) { print "chunk with the 'the': $content\n"; die; } } my $ua = LWP::UserAgent->new(); my $response = $ua->get( 'http://perlmonks.org/', ':content_cb' => \&little_bit, ':read_size_hint' => 100 ); __END__ chunk length 100 chunk length 100 chunk length 100 chunk with the 'the': The Monastery Gates </title> <link rel="stylesheet" href="/css/common.css" type="text/ [download]	[reply] [d/l] [select]
Re: How to read just part of a url's content by jettero (Monsignor) on May 29, 2007 at 15:17 UTC
You mean just get part of the file located with the URL? There is almost certainly a way to do it, since wget (et al) can be instructed to continue. Personally, I can't find a single relevant option in LWP, LWP::UserAgent, HTTP::Request, HTTP::Headers, and more. I'm very curious to see how how you would get a selected portion of a file with LWP. Someone will know. UPDATE #1: Further investigation has revealed that you can set a "range" header with `$request_object->header( $field => $value )`; but I haven't yet worked out the particulars of the header. Read more... UPDATE #2, I did finally figure it out (1254 Bytes) -Paul	[reply] [d/l] [select]
Re^2: How to read just part of a url's content by ikegami (Patriarch) on May 29, 2007 at 19:56 UTC
You mean just get part of the file located with the URL? No, "file" is a less accurate accurate word. The content identified by a URI is not necessarily a file.	[reply]
Re^3: How to read just part of a url's content by jettero (Monsignor) on May 30, 2007 at 03:25 UTC
That's a semantic argument. Is a file a space on a hard drive? Is it a stream of bytes? A sector on a tape? There's an application where I work that describes a file as 100 variably sized blocks. My point was that the URL just describes the location of something. You already have it. -Paul	[reply]
Re: How to read just part of a url's content by naikonta (Curate) on May 29, 2007 at 15:18 UTC
looking for a specific string in a url `use` `URI;` get just part of the content of the url `use` `HTML::TokeParser;` Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!	[reply] [d/l] [select]