mildside has asked for the wisdom of the Perl Monks concerning the following question:

I recently needed to write a Perl script that had to replace a byte in position 148 of a large binary file. The inserted byte was dynamic, depending on the file's contents.

I read in and wrote out the first 148 characters of the file(using read and print), inserted my new character, and then used `/usr/xpg4/bin/tail -c +150 $infile >> $outfile`; to append the remainder of the file.

This worked fine, but I was wondering if there is a better way to do it. The code does not need to be portable, so that's not an issue. I was under some pressure time-wise to get this going so this seemed to be the quickest way to get it to work. Is there a good and efficient way in perl to simply read in and write out large amounts of binary (not text) data?

Thanks in advance, and Cheers!

Replies are listed 'Best First'.
Re: Copying binary data efficiently
by adrianh (Chancellor) on Jul 30, 2003 at 23:32 UTC

    If all you want to do is change a single byte in a file - rather than copy the file - take a look at seek.

    You can use it to move to an arbritary point in a file. You can then just overwrite the byte you want to change rather than copying the whole file.

      Doh - Of course!! In my job I'm so used to reading text data in one form and writing it out to a new file in a usually entirely different form that it didn't even occur to me to just alter the existing file.

      So I should have opened the file with the "+<" mode for read/write, seek to position 148 and write the byte there. (I would still need to copy the file first as I need to keep the original.)

      Talk about closed thinking - I can't believe I overlooked that. Thanks adrianh and ++ (once I get some more votes!).

      Cheers!

Re: Copying binary data efficiently
by mildside (Friar) on Jul 31, 2003 at 02:40 UTC
    Now that I've had time to get over my embarrassment, I'm still a bit curious as to how do this if I were inserting into the file rather than overwriting. If the file is very large, what would be the best/fastest/memory efficient way to copy big sections (including up to the end of the file) of an input file to an output file? A character at a time sounds inefficient, so I'm assuming larger chunks would be better, but then you need to handle the end of the file carefully etc.

    Cheers!

      Keep reading through the file, a block at a time, until you get to the block with your character. Print out the block up to your character, then whatever you need to insert, then the rest of the block. Then just loop until EOF, again a block at a time, for the rest of the file.

      Something like this:

      #!/usr/bin/perl -w use strict; my($replacepos,$replacestr)=@ARGV; my $BLOCKSIZE = 10; my $buf; my $pos = 0; while (read(STDIN,$buf,$BLOCKSIZE)) { # Does this contain our byte? if ( ($replacepos >= $pos) && ($replacepos < ($pos + length($buf)))) { print substr($buf,0,$replacepos-$pos), $replacestr, substr($buf,$replacepos-$pos+1); } else { print $buf; } $pos += length($buf); }
      10 is a good blocksize for demonstration, because it's easy to verify the edge cases. In real life, on most system 4096 is the best block size (it matches up with the size the system really reads from the disk).

      If it's a genuinely tremendous file and there are actually performance problems, using Mmap might be more efficient (it is in C, I haven't use it in Perl).

        Thanks sgifford! Your reply is exactly what I was after, and you have also introduced me to the Mmap module which I haven't used before, but now that I'm aware of it I'm pretty sure to find a use or two for it in the future.

        Does anyone know how reliable and well tested that module is? Good or bad experiences?

        Cheers!