isync has asked for the wisdom of the Perl Monks concerning the following question:

I need to combine a read which produces chunks of (mostly) equal size (till EOF) of read data with an option to control where the read() starts and where it ends. This is what I came up with:
# chunked read() with optional ranges: the way read() is used here +, we can't use the LENGTH, OFFSET feature of read() # so we need to use the initial seek() in combination with a limit + by position my $chunk_size = $cnf->{chunk_size} || 1024; my $bytes_in = $cnf->{bytes_in} || 0; my $bytes_out= $cnf->{bytes_out}; my $pos = $bytes_in; seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += length($chunk_size); if( defined($bytes_out) && $pos >= $bytes_out){ # return bytes::substr($buffer, 0, ($chunk_size - ($pos - +$bytes_out)) ); # make last chunk shorter print bytes::substr($buffer, 0, ($chunk_size - ($pos - $by +tes_out)) ); last; }else{ # return $buffer; print $buffer; } }
As you can see, I avoided tell() to determine position. I blindly assumed it would be slower/produce a disk operation, hence the reliance on a calculated $pos.

Please suggest improvements, monks!

Replies are listed 'Best First'.
Re: Advice needed on chunked read + byte-range logic
by ikegami (Patriarch) on Aug 25, 2010 at 22:00 UTC
    use POSIX qw( ceil ); my $to_return = $bytes_out-$bytes_in; my $to_read = ceil($to_return/$bytes)*$bytes; seek($fh, $bytes_in, SEEK_SET); read($fh, my $buffer, $to_return); seek($fh, $bytes_in+$to_read, SEEK_SET); return $buffer;
Re: Advice needed on chunked read + byte-range logic
by isync (Hermit) on Aug 25, 2010 at 23:16 UTC
    My current code:
    my $chunk_size = $cnf->{chunk_size} || 1024; my $bytes_in = $cnf->{bytes_in} || 0; my $bytes_out= $cnf->{bytes_out} || undef; my $pos = $bytes_in; seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print bytes::substr($buffer, 0, ($chunk_size - ($pos - $ +bytes_out)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } }
    bugs fixed:
    - $pos summing was wrong
    - differentiate between > and == and decide if we *really* need another susbtr() or if a simple print() will do

      Why bytes::substr? Do you expect read to return something other than bytes? If so, you should make it so it doesn't (by using binmode) rather than attempting to work around it (by using bytes::substr).

      The following leaves the file pointer at the same spot as your code:

      seek($fh, $bytes_in, SEEK_SET) or die($!); my $to_read = $bytes_out-$bytes_in; while ($to_read > 0) { my $rv = read($fh, my $buffer, $chunk_size); die($!) if !defined($rv); die("Premature eof") if !$rv; substr($buffer, $to_read) = '' if $rv > $to_read; print($buffer); $to_read -= $bytes; }
        I did so because I thought the read data might be a text file, e.g. non-utf8 ascii or so. As substr() by default operates in terms of characters, I wanted to prevent it "falling back" into character mode and *always* return byte offsets. I didn't expect that a "this is bin data" $string information would remain intact over $fh declared as binmode() -> read() into buffer ->substr() operation on this string...
        Further I thought binmode() is more a Win32 thing and as my code won't ever hit the MS world, I seldomly use it. As *you* refer to it, I think I should go back to using it.

        Just for completeness, my former code updated:
        my $chunk_size = $cnf->{chunk_size} || 1024; my $bytes_in = $cnf->{bytes_in} || 0; my $bytes_out= $cnf->{bytes_out} || undef; my $pos = $bytes_in; binmode($fh); seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print substr($buffer, 0, ($chunk_size - ($pos - $bytes_o +ut)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } }
Re: Advice needed on chunked read + byte-range logic
by isync (Hermit) on Aug 25, 2010 at 22:08 UTC
    arg. Sorry! (original post updated, see code)

    I had a return() where a print() must be. See, the function appends these chunks to STDOUT. As I read your provided logic, it writes everything to a (possibly large) $buffer and then outputs it as a whole. Or did I oversee something in your solution?

    2nd update: had a mix up in there with $buffer and $bytes, cleared up by renaming to $chunk_size.

      Assuming you don't need to read the characters you end up discarding, I'd use the following: (Complete with error checking)

      use List::Util qw( min ); seek($fh, $bytes_in, SEEK_SET) or die($!); my $to_read = $bytes_out-$bytes_in; while ($to_read) { my $rv = read($fh, my $buffer, min($chunk_size, $to_read)); die($!) if !defined($rv); die("Premature eof") if !$rv; $to_read -= $rv; print($buffer); }

      If you do want to move the file pointer beyond the last byte you need as you are currently doing, then you could continue always blocks of $chunk_size as you are currently doing.

        Elegant.

        Good that you reminded me of error checking!

        btw: perlmonks.org should be renamed askikegami.org ;) Thanks!