in reply to Inserting text into the middle of a file without clobbering any other text

You can't insert into the middle of a file as it is a single logical contiguous unit. The easiest way to do this is to read it in, printing out each line to a temp file (say called file.txt.upadate) until the insert point, then print the addition to this temp file, then the rest. Lastly rename the temp file to the original file name to replace the old file in a unitary operation that is all or nothing - it works or it fails. No half measures and corrupt files.

while (<OLD>) { insert() if /something/; print TMP $_; } sub insert { print TMP, $insert } rename $tmp, $old or die "Can't rename $tmp to $old, Perl whines $!\n" +;

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

  • Comment on Re: Inserting text into the middle of a file without clobbering any other text
  • Download Code

Replies are listed 'Best First'.
Re: Re: Inserting text into the middle of a file without clobbering any other text
by Anonymous Monk on Aug 01, 2002 at 15:53 UTC

    Thanks for the info, and the code snippet. I've got one question though - I just thought of using the UNIX 'cat' command to splice together the two halves of the file (outputting stuff at the 'insertion point' to a different file, coming back to insert the stuff in the original file, and then using 'cat' to combine the two files.

    I'm going to be working with huge text files, though, so efficiency is a Very Big Deal. Do you know if using a system call to 'cat' will be less efficient than creating the temp file etc? I haven't found a comparable perl function.

    Thanks.

      I'm going to be working with huge text files, though, so efficiency is a Very Big Deal. Do you know if using a system call to 'cat' will be less efficient than creating the temp file etc?

      You could try running some alternatives with the Benchmark module, but I would expect that there are situations where a system call to unix "cat" will be more efficient than doing everything in Perl (and yours may be one such case).

      Maybe you want to structure the process so that it can "batch" a bunch of "random-access" inserts into a single "post-edit" loop to interleave the sequential and "random insertion" pieces of data -- e.g. suppose that while you are writing a continuous, sequential output stream, you are actually keeping track of multiple "breakpoints" where future additions may need to be spliced in. Suppose further that, each time you come up with some data that needs to be spliced into one of those breakpoints that were noted earlier in the output, you write this data to some other temp file, or just keep it in a hash array keyed by, say, the byte-offset where its supposed to go. When you get to the end of the sequential output, you can now read that back in portions (from one breakpoint to the next), and interleave those with the appropriate temp files or hash values in the required sequence, to produce the intended final output.

      Actually, with this sort of design, I would think that Perl could easily provide the easiest and most effecient method: use sysread on the sequentially written temp file to get the chunks between breakpoints, and write these out to the final file, interleaved with hash array elements that store the "post-hoc insertion" pieces. I hope that's clear, but here's a pseudo-code summary:

      open( SEQ, ">sequential_output.file" ); $byteoffset = 0; while there's data to be written { $breakpoint{$byteoffset} = "" if this location might need to get an insertion at a later point if I have sequential data { print SEQ $sequential_data; $byte_offset += length($sequential_data); } else { # this is data that needs to be back-fitted to # a byteoffset that I stored earlier $back_offset = whatever previous byteoffset is right $breakpoint{$back_offset} .= $backfit_data; } } close SEQ; $breakpoint{$byteoffset} = ""; # do that so the following loop will handle the final chunk # final loop: put all the pieces together # in the intended order open(SEQ, "<sequential_output.file"); open(OUT, ">final_output.file"); foreach $chunk ( sort {$a<=>$b} keys %breakpoint ) { sysread( SEQ, $chunkdata, $chunk ); print OUT $chunkdata, $breakpoint{$chunk} }