in reply to Re^6: Muy Large File
in thread Muy Large File
One of the requirements I have is to create a log that indicates which record the TR actually modified. Any ideas on how to do this whilst retaining the performance?
I would make the buffer size a multiple of the fixed record size. The non-power-of-two-ness may have a slight impact on the performance, but it will probably be negligable. I would then perform the translation on record-sized chunks of the buffer, by using substr as an lvalue; something like:
my $recno = 0; while( sysread $FH, $buffer, $RECSIZE * $MULTIPLE ) { my $readPos = sysseek $FH, 0, 1; ## simulate "systell()". for( 0 .. $MULTIPLE - 1 ) { if( my $changed = substr( $buffer, $_ * $RECSIZE, $RECSIZE ) =~ tr[...][...] ) { print LOG "Changed $changed chars in record: ", $recno + $ +_; # Calculate positions of modified record. + my $writePos = ( $recno + $_ )* $RECSIZE ; ## Check this c +alc! Untested! sysseek $FH, $writePos, 0; syswrite $fh, substr( $buffer, $_ * $RECSIZE, $RECSIZE ); sysseek $FH, $readPos, 0; ## Restore read position if we m +oved it. } } $recno += $MULTIPLE; }
There are few things to note here:
Whether this is a good strategy will depend upon the frequency of modification.
Even then, if some buffers do not require any modification, the avoiding re-writing those will pay double benefit by avoiding the need to back up the readhead as well as avoiding the actual write.
You could make this decision dynamically. Build an array of the modified record numbers as you do the translation and defer the re-writing until you have processed an complete buffer. If the proportion of $MULTIPLE is greater than some cutoff, re-write the entire buffer, else do just the modified records individually.
Implementing this, and deciding the breakpoints is left as an exercise for the reader :)
|
---|