in reply to Copying binary data efficiently

Now that I've had time to get over my embarrassment, I'm still a bit curious as to how do this if I were inserting into the file rather than overwriting. If the file is very large, what would be the best/fastest/memory efficient way to copy big sections (including up to the end of the file) of an input file to an output file? A character at a time sounds inefficient, so I'm assuming larger chunks would be better, but then you need to handle the end of the file carefully etc.

Cheers!

Replies are listed 'Best First'.
Re: Re: Copying binary data efficiently
by sgifford (Prior) on Jul 31, 2003 at 04:26 UTC

    Keep reading through the file, a block at a time, until you get to the block with your character. Print out the block up to your character, then whatever you need to insert, then the rest of the block. Then just loop until EOF, again a block at a time, for the rest of the file.

    Something like this:

    #!/usr/bin/perl -w use strict; my($replacepos,$replacestr)=@ARGV; my $BLOCKSIZE = 10; my $buf; my $pos = 0; while (read(STDIN,$buf,$BLOCKSIZE)) { # Does this contain our byte? if ( ($replacepos >= $pos) && ($replacepos < ($pos + length($buf)))) { print substr($buf,0,$replacepos-$pos), $replacestr, substr($buf,$replacepos-$pos+1); } else { print $buf; } $pos += length($buf); }
    10 is a good blocksize for demonstration, because it's easy to verify the edge cases. In real life, on most system 4096 is the best block size (it matches up with the size the system really reads from the disk).

    If it's a genuinely tremendous file and there are actually performance problems, using Mmap might be more efficient (it is in C, I haven't use it in Perl).

      Thanks sgifford! Your reply is exactly what I was after, and you have also introduced me to the Mmap module which I haven't used before, but now that I'm aware of it I'm pretty sure to find a use or two for it in the future.

      Does anyone know how reliable and well tested that module is? Good or bad experiences?

      Cheers!