Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
One of the requirements I have is to create a log that indicates which record the TR actually modified. Any ideas on how to do this whilst retaining the performance?

I would make the buffer size a multiple of the fixed record size. The non-power-of-two-ness may have a slight impact on the performance, but it will probably be negligable. I would then perform the translation on record-sized chunks of the buffer, by using substr as an lvalue; something like:

my $recno = 0; while( sysread $FH, $buffer, $RECSIZE * $MULTIPLE ) { my $readPos = sysseek $FH, 0, 1; ## simulate "systell()". for( 0 .. $MULTIPLE - 1 ) { if( my $changed = substr( $buffer, $_ * $RECSIZE, $RECSIZE ) =~ tr[...][...] ) { print LOG "Changed $changed chars in record: ", $recno + $ +_; # Calculate positions of modified record. + my $writePos = ( $recno + $_ )* $RECSIZE ; ## Check this c +alc! Untested! sysseek $FH, $writePos, 0; syswrite $fh, substr( $buffer, $_ * $RECSIZE, $RECSIZE ); sysseek $FH, $readPos, 0; ## Restore read position if we m +oved it. } } $recno += $MULTIPLE; }

There are few things to note here:

  • The read is a multiple of the fixed record size.
  • The records are translated in-place, but 1 at a time by using substr as an lvalue to step through the buffer.
  • tr/// returns a count of the modifications it makes thereby avoiding the need to make two passes.
  • I've shown only the modified records being re-written--and individually.

    Whether this is a good strategy will depend upon the frequency of modification.

    • If the frequency is low, re-writing small, sparse modifications should give a net gain over re-writing everything.
    • If the frequency is high, then rewriting the whole buffer in a single pass will be quicker.

      Even then, if some buffers do not require any modification, the avoiding re-writing those will pay double benefit by avoiding the need to back up the readhead as well as avoiding the actual write.

      You could make this decision dynamically. Build an array of the modified record numbers as you do the translation and defer the re-writing until you have processed an complete buffer. If the proportion of $MULTIPLE is greater than some cutoff, re-write the entire buffer, else do just the modified records individually.

      Implementing this, and deciding the breakpoints is left as an exercise for the reader :)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?

In reply to Re^7: Muy Large File by BrowserUk
in thread Muy Large File by BuddhaLovesPerl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-26 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found