Advice needed on chunked read + byte-range logic

isync has asked for the wisdom of the Perl Monks concerning the following question:

I need to combine a read which produces chunks of (mostly) equal size (till EOF) of read data with an option to control where the read() starts and where it ends. This is what I came up with:

    # chunked read() with optional ranges: the way read() is used here
+, we can't use the LENGTH, OFFSET feature of read()
    # so we need to use the initial seek() in combination with a limit
+ by position
    my $chunk_size = $cnf->{chunk_size} || 1024;
    my $bytes_in = $cnf->{bytes_in} || 0;
    my $bytes_out= $cnf->{bytes_out};

    my $pos = $bytes_in;
    seek($fh,$bytes_in,0);
    while ( read( $fh, my $buffer, $chunk_size ) ) {
        $pos += length($chunk_size);

        if( defined($bytes_out) && $pos >= $bytes_out){
            # return bytes::substr($buffer, 0, ($chunk_size - ($pos - 
+$bytes_out)) );       # make last chunk shorter
            print bytes::substr($buffer, 0, ($chunk_size - ($pos - $by
+tes_out)) );
            last;
        }else{
            # return $buffer;
            print $buffer;
        }
    }
[download]

As you can see, I avoided tell() to determine position. I blindly assumed it would be slower/produce a disk operation, hence the reliance on a calculated $pos.

Please suggest improvements, monks!

Comment on Advice needed on chunked read + byte-range logic Download Code

Replies are listed 'Best First'.
Re: Advice needed on chunked read + byte-range logic by ikegami (Patriarch) on Aug 25, 2010 at 22:00 UTC
`use POSIX qw( ceil ); my $to_return = $bytes_out-$bytes_in; my $to_read = ceil($to_return/$bytes)*$bytes; seek($fh, $bytes_in, SEEK_SET); read($fh, my $buffer, $to_return); seek($fh, $bytes_in+$to_read, SEEK_SET); return $buffer;` [download]	[reply] [d/l]
Re: Advice needed on chunked read + byte-range logic by isync (Hermit) on Aug 25, 2010 at 23:16 UTC
My current code: my $chunk_size = $cnf->{chunk_size} \|\| 1024; my $bytes_in = $cnf->{bytes_in} \|\| 0; my $bytes_out= $cnf->{bytes_out} \|\| undef; my $pos = $bytes_in; seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print bytes::substr($buffer, 0, ($chunk_size - ($pos - $ +bytes_out)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } } [download] bugs fixed: - $pos summing was wrong - differentiate between > and == and decide if we really need another susbtr() or if a simple print() will do	[reply] [d/l]
Re^2: Advice needed on chunked read + byte-range logic by ikegami (Patriarch) on Aug 26, 2010 at 01:38 UTC
Why `bytes::substr`? Do you expect `read` to return something other than bytes? If so, you should make it so it doesn't (by using `binmode`) rather than attempting to work around it (by using `bytes::substr`). The following leaves the file pointer at the same spot as your code: `seek($fh, $bytes_in, SEEK_SET) or die($!); my $to_read = $bytes_out-$bytes_in; while ($to_read > 0) { my $rv = read($fh, my $buffer, $chunk_size); die($!) if !defined($rv); die("Premature eof") if !$rv; substr($buffer, $to_read) = '' if $rv > $to_read; print($buffer); $to_read -= $bytes; }` [download]	[reply] [d/l] [select]
Re^3: Advice needed on chunked read + byte-range logic by isync (Hermit) on Aug 27, 2010 at 19:05 UTC
I did so because I thought the read data might be a text file, e.g. non-utf8 ascii or so. As substr() by default operates in terms of characters, I wanted to prevent it "falling back" into character mode and always return byte offsets. I didn't expect that a "this is bin data" $string information would remain intact over $fh declared as binmode() -> read() into buffer ->substr() operation on this string... Further I thought binmode() is more a Win32 thing and as my code won't ever hit the MS world, I seldomly use it. As you refer to it, I think I should go back to using it. Just for completeness, my former code updated: my $chunk_size = $cnf->{chunk_size} \|\| 1024; my $bytes_in = $cnf->{bytes_in} \|\| 0; my $bytes_out= $cnf->{bytes_out} \|\| undef; my $pos = $bytes_in; binmode($fh); seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print substr($buffer, 0, ($chunk_size - ($pos - $bytes_o +ut)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } } [download]	[reply] [d/l]
Re^4: Advice needed on chunked read + byte-range logic by ikegami (Patriarch) on Aug 27, 2010 at 22:20 UTC
Re: Advice needed on chunked read + byte-range logic by isync (Hermit) on Aug 25, 2010 at 22:08 UTC
arg. Sorry! (original post updated, see code) I had a return() where a print() must be. See, the function appends these chunks to STDOUT. As I read your provided logic, it writes everything to a (possibly large) $buffer and then outputs it as a whole. Or did I oversee something in your solution? 2nd update: had a mix up in there with $buffer and $bytes, cleared up by renaming to $chunk_size.	[reply]
Re^2: Advice needed on chunked read + byte-range logic by ikegami (Patriarch) on Aug 25, 2010 at 22:58 UTC
Assuming you don't need to read the characters you end up discarding, I'd use the following: (Complete with error checking) `use List::Util qw( min ); seek($fh, $bytes_in, SEEK_SET) or die($!); my $to_read = $bytes_out-$bytes_in; while ($to_read) { my $rv = read($fh, my $buffer, min($chunk_size, $to_read)); die($!) if !defined($rv); die("Premature eof") if !$rv; $to_read -= $rv; print($buffer); }` [download] If you do want to move the file pointer beyond the last byte you need as you are currently doing, then you could continue always blocks of $chunk_size as you are currently doing.	[reply] [d/l]
Re^3: Advice needed on chunked read + byte-range logic by isync (Hermit) on Aug 25, 2010 at 23:27 UTC
Elegant. Good that you reminded me of error checking! btw: perlmonks.org should be renamed askikegami.org ;) Thanks!	[reply]