in reply to binmode copy loses final byte

Your file needs to have a multiple of sizeof(int) bytes. It probably doesn't.


At a glance, I see three bugs:

@bufAry = unpack('I*', $buf); assumes the buffer contains a multiple of sizeof(int) bytes.

Similarly, your output loop always starts writing from the start of the buffer, even if those bytes have already been written. Why not just use print?

my @bufAry = []; places a reference to an anonymous array in the array. That makes no sense. Good thing you overwrite this before using @bufAry. Remove the incorrect and useless initialization.

Furthermore,

The lack of use strict; or equivalent is sad. ALWAYS use this.

There's also the useless undef($buf);. Actually, it's worse than useless. It forces the string buffer to be deallocated, only to have it re-allocated on the next time. This is a waste of CPU, and it can fragment your memory.

You're needlessly using global variables for your handles. Don't do that. Use lexical variables like you do everywhere else. We're not in the 1990s.

Avoid two-arg open. Again, we're not in the 1990s.

Finally, I also question if I is the right format, since it's not the same size on every machine. See Mini-Tutorial: Formats for Packing and Unpacking Numbers.

Replies are listed 'Best First'.
Re^2: binmode copy loses final byte
by ikegami (Patriarch) on Jun 30, 2025 at 19:19 UTC
    #!/usr/bin/perl use strict; use warnings; my $in_qfn = '...'; my $out_qfn = '...'; open( my $in_fh, "<:raw", $in_qfn ) or die( "Can't open `$in_qfn`: $!\n" ); open( my $out_fh, ">:raw", $out_qfn ) or die( "Can't create`$out_qfn `: $!\n" ); my $int_size = 4; # length( pack( "L<", 0 ) ); local $/ = \( 8 * 1024 ); while ( my $in_buf = <$in_fh> ) { length( $in_buf ) % $int_size == 0 or die( "Invalid input file `$in_qfn`\n" ); my @ints = unpack( "L<*", $in_buf ); # Something that modifies `@ints` here. print( $out_fh pack( "L<*", @ints ) ) or die( "Error writing to `$out_qfn`: $!\n" ); } close( $in_fh ) or die( "Error reading from `$in_qfn`: $!\n" ); close( $out_fh ) or die( "Error writing to `$out_qfn`: $!\n" );

    Without readline (<>) and print:

    #!/usr/bin/perl use strict; use warnings; my $in_qfn = '...'; my $out_qfn = '...'; open( my $in_fh, "<:raw", $in_qfn ) or die( "Can't open `$in_qfn`: $!\n" ); open( my $out_fh, ">:raw", $out_qfn ) or die( "Can't create`$out_qfn `: $!\n" ); my $blk_size = ( stat( $in_fh ) )[ 11 ] || 16384; my $int_size = 4; # length( pack( "L<", 0 ) ); $blk_size % $int_size == 0 or die( "Invalid block size\n" ); my $in_buf = ''; while ( 1 ) { my $bytes_read = sysread( $in_fh, $in_buf, $blk_size, length( $in_b +uf ) ); defined( $bytes_read ) or die( "Error reading from `$in_qfn`: $!\n" ) ; $bytes_read or last; my @ints = unpack( "L<*", $in_buf ); substr( $in_buf, 0, @ints * $int_size, "" ); # Something that modifies `@ints` here. { my $out_buf = pack( "L<*", @ints ); my $total_to_write = length( $out_buf ); my $total_written = 0; while ( $total_to_write ) { my $bytes_written = syswrite( $out_fh, $out_buf, $blk_size, $ +total_written ); defined( $bytes_written ) or die( "Error writing to `$out_qfn`: $!\n" ); $total_written += $bytes_written; $total_to_write -= $bytes_written; } } } length( $in_buf ) == 0 or die( "Invalid input file `$in_qfn`\n" ); close( $in_fh ) or die( "Error reading from `$in_qfn`: $!\n" ); close( $out_fh ) or die( "Error writing to `$out_qfn`: $!\n" );

      Both of those examples die, giving the 'Invalid input file...' message part way through reading an input file.

      An output file is written, but only up to the point where reading died.

      Happens both with a very large *.txt file, and also a medium sized *.jpg file. Nothing looks untoward at that point for the *.txt input file. The output file ends in the middle of a text line.

        Then give it a valid input file! Your program needs a file that consists of a sequence of "packed" int objects, but you didn't provide such a file. Put differently, you need a file whose size is a multiple of sizeof(int).

        Or, from the other perspective, you wrote a program that needs a file whose size is a multiple of sizeof(int), but that's not what the program should expect.

        When using a block ciphers on a stream of arbitrary length, a padding algorithm is used to make the input size a multiple of the block size before encryption.

        In your case, you could use the following:

        my $padding_len = $int_size - ( length( $buf ) % $int_size ); my $padding = ( "\x00" x ( $padding_len - 1 ) ) . chr( $padding_len ); $buf .= $padding;

        After decrypting, you remove the padding by removing an amount of bytes equal to the value of the last bytes.

        $buf = substr( $buf, 0, -ord( substr( $buf, -1 ) ) );