in reply to bin mode file copy

First, if you open a file, it will be opened by default in text format. That means that \n (new lines), etc mean things.

If the file you are going to copy is just a bunch of bits, then you should do something like this to just completely ignore any such bytes:

$out = "output_path"; open (OUTBIN, '>',"$out") || die "unable to open $out"; binmode(OUTBIN) || die "unable to set binmode $out";
The above says that I'm opening the $out file and I'm just gonna send bits to it! This next part does the copy..
open(INBIN, "<$x") || die "unable to open $x"; binmode(INBIN) || die "unable to set binmode $x"; while (read(INBIN, my $buff, 8 * 2**10)) { print OUTBIN $buff; } close(INBIN) || die "unable to close inbin"; close(OUTBIN) || die "unable to close outbin";
The above says: open filepath $x, then set it for binary read. Each read will be 8*2**10 or 8 * 1024= 8192 bytes. Perl will help out here as it keeps track of the number of bytes in $buff. If the $buff has less than what you expect for the maximum read, this is no problem!

2**10=1024 is a "magic number" for the hardware.
A typical Unix system will read from the hard drive in increments of 4x that or 4096 bytes. Here the buffer size is twice that or 8192 bytes. It is counter-intuitive, but increasing the buffer size can actually slow things down if you have a smart disk system.

The important part is to set BINMODE. And for the read, you will have to specify a size that should be in increments of 1024 (a magic number).

Replies are listed 'Best First'.
Re^2: bin mode file copy
by almut (Canon) on Mar 16, 2009 at 20:22 UTC
    A typical Unix system will read from the hard drive in increments of 4x that or 4096 bytes. Here the buffer size is twice that or 8192 bytes. It is counter-intuitive, but increasing the buffer size can actually slow things down if you have a smart disk system.

    With respect to disk reads, it doesn't really matter what size you specify with Perl's read(). Perl uses its own internal buffer anyway, which is 4k (hardcoded in the Perl source, i.e. not configurable, except by recompiling perl). In case you don't believe me, do an strace on your sample code, and you'll see that the underlying system calls always return 4096, independently of whether you specify 1k, 2k, 8k, or whatever size with read().

    But you could use sysread(), which is actually implemented in terms of the read(2) system call, and thus does pass through the size you request. Whether the latter actually maps to disk block read requests is still another story, though... (depends on the OS).

    You might also want to read 4k read buffer is too small.