in reply to Advice needed on chunked read + byte-range logic

My current code:
my $chunk_size = $cnf->{chunk_size} || 1024; my $bytes_in = $cnf->{bytes_in} || 0; my $bytes_out= $cnf->{bytes_out} || undef; my $pos = $bytes_in; seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print bytes::substr($buffer, 0, ($chunk_size - ($pos - $ +bytes_out)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } }
bugs fixed:
- $pos summing was wrong
- differentiate between > and == and decide if we *really* need another susbtr() or if a simple print() will do

Replies are listed 'Best First'.
Re^2: Advice needed on chunked read + byte-range logic
by ikegami (Patriarch) on Aug 26, 2010 at 01:38 UTC

    Why bytes::substr? Do you expect read to return something other than bytes? If so, you should make it so it doesn't (by using binmode) rather than attempting to work around it (by using bytes::substr).

    The following leaves the file pointer at the same spot as your code:

    seek($fh, $bytes_in, SEEK_SET) or die($!); my $to_read = $bytes_out-$bytes_in; while ($to_read > 0) { my $rv = read($fh, my $buffer, $chunk_size); die($!) if !defined($rv); die("Premature eof") if !$rv; substr($buffer, $to_read) = '' if $rv > $to_read; print($buffer); $to_read -= $bytes; }
      I did so because I thought the read data might be a text file, e.g. non-utf8 ascii or so. As substr() by default operates in terms of characters, I wanted to prevent it "falling back" into character mode and *always* return byte offsets. I didn't expect that a "this is bin data" $string information would remain intact over $fh declared as binmode() -> read() into buffer ->substr() operation on this string...
      Further I thought binmode() is more a Win32 thing and as my code won't ever hit the MS world, I seldomly use it. As *you* refer to it, I think I should go back to using it.

      Just for completeness, my former code updated:
      my $chunk_size = $cnf->{chunk_size} || 1024; my $bytes_in = $cnf->{bytes_in} || 0; my $bytes_out= $cnf->{bytes_out} || undef; my $pos = $bytes_in; binmode($fh); seek($fh,$bytes_in,0); while ( read( $fh, my $buffer, $chunk_size ) ) { $pos += $chunk_size; # alt.: += length($buffer) if( defined($bytes_out) && $pos > $bytes_out){ print substr($buffer, 0, ($chunk_size - ($pos - $bytes_o +ut)) ); # make last chunk shorter last; }elsif( defined($bytes_out) && $pos == $bytes_out){ print $buffer; last; }else{ print $buffer; } }

        I wanted to prevent it "falling back" into character mode and *always* return byte offsets.

        By using bytes, you do exactly the opposite.

        require bytes; $x = "\xC9\xCA\xCB\xCC"; utf8::downgrade($x); print(substr($x,1,1) eq "\xCA" ?1:0,"\n"); # 1 utf8::upgrade($x); print(substr($x,1,1) eq "\xCA" ?1:0,"\n"); # 1 utf8::downgrade($x); print(bytes::substr($x,1,1) eq "\xCA" ?1:0,"\n"); # 1 utf8::upgrade($x); print(bytes::substr($x,1,1) eq "\xCA" ?1:0,"\n"); # 0

        bytes gives access to the internal storage format of the string. It has nothing to do with whether the string only contains bytes or not.

        bytes::substr will probably do what you want. substr definitely will.

        Further I thought binmode() is more a Win32 thing and as my code won't ever hit the MS world, I seldomly use it.

        Prevents CRLF translations when the :crlf layer is used. Normally just on Windows.
        Removes :encoding layers to prevent decoding. Shouldn't be there, but you're the one who's worried.

        Just for completeness, my former code updated:

        I don't know why you're asking for help (or why I'm giving it) if you sticking with that complex, buggy code when the simpler solution even does error checking.

        Despite prompting, you never indicated whether it matters where you leave the file pointer when you're done. Does it?