Re^7: Muy Large File

One of the requirements I have is to create a log that indicates which record the TR actually modified. Any ideas on how to do this whilst retaining the performance?

I would make the buffer size a multiple of the fixed record size. The non-power-of-two-ness may have a slight impact on the performance, but it will probably be negligable. I would then perform the translation on record-sized chunks of the buffer, by using substr as an lvalue; something like:

my $recno = 0;
while( sysread $FH, $buffer, $RECSIZE * $MULTIPLE ) {
    my $readPos = sysseek $FH, 0, 1; ## simulate "systell()".
    for( 0 .. $MULTIPLE - 1 ) {
        if( my $changed 
            = substr( $buffer, $_ * $RECSIZE, $RECSIZE ) 
            =~ tr[...][...] 
        ) {
            print LOG "Changed $changed chars in record: ", $recno + $
+_;
            # Calculate positions of modified record.                 
+    
            my $writePos = ( $recno + $_ )* $RECSIZE ; ## Check this c
+alc! Untested!
            sysseek $FH, $writePos, 0;
            syswrite $fh, substr( $buffer, $_ * $RECSIZE, $RECSIZE );
            sysseek $FH, $readPos, 0; ## Restore read position if we m
+oved it.
        }
    }
    $recno += $MULTIPLE;
}
[download]

There are few things to note here:

The read is a multiple of the fixed record size.
The records are translated in-place, but 1 at a time by using substr as an lvalue to step through the buffer.
tr/// returns a count of the modifications it makes thereby avoiding the need to make two passes.
I've shown only the modified records being re-written--and individually.
Whether this is a good strategy will depend upon the frequency of modification.
- If the frequency is low, re-writing small, sparse modifications should give a net gain over re-writing everything.
- If the frequency is high, then rewriting the whole buffer in a single pass will be quicker.
  Even then, if some buffers do not require any modification, the avoiding re-writing those will pay double benefit by avoiding the need to back up the readhead as well as avoiding the actual write.
  You could make this decision dynamically. Build an array of the modified record numbers as you do the translation and defer the re-writing until you have processed an complete buffer. If the proportion of $MULTIPLE is greater than some cutoff, re-write the entire buffer, else do just the modified records individually.
  Implementing this, and deciding the breakpoints is left as an exercise for the reader :)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco.

Rule 1 has a caveat! -- Who broke the cabal?

Comment on Re^7: Muy Large File Download Code