in reply to Re^2: Trying to optimize reading/writing of large text files.
in thread Trying to optimize reading/writing of large text files.

I liked your questions and I up-voted that post. You added some additional information that helped me a lot to understand what you are doing.

The basic issue here is: how make updates to this shared file "atomic" - meaning works or doesn't work and no partial updates allowed.

Instead of "locking" the LOG file, if you want to essentially delete that file and replace it with another file. You need a lock on something else for coordination because the lock on the LOG file will disappear when you delete it (and I think you need to do that in order to replace it with the TEMP file via rename). No lock on the TEMP file is needed because there will only be one temp file at a time. And a "read lock" on the LOG file does no good. We need to gain exclusive access to this critter and then update it.

I've tried hard to explain this. Let me know what isn't clear. And of course, its always possible that I've made some error. So please let us know how this works out!

  • Comment on Re^3: Trying to optimize reading/writing of large text files.

Replies are listed 'Best First'.
Re^4: Trying to optimize reading/writing of large text files.
by Anonymous Monk on Jan 23, 2012 at 03:27 UTC
    I'm sure that DB is the best solution, but this script is running on very restricted environment where is no access to SQL or even to CPAN modules. So I have to code it in pure Perl.

    LOG file is (potentially) heavily accessed by numerous script instances. About 90% of times it's READ access, and 10% is READ-MODIFY-OVERWRITE process. The code we are discussing here is READ-MODIFY-WRITE part of the program. Actually i used "flock LOG, 1" because I wanted to let other instances have read-only access to the LOG (even if it's contents is outdated).

    It's good idea about "flag" file. I think it would finally resolve a problem with possible file corruption. But it adds one more file operation and may potentially affect performance. So, I'm going to experiment little bit and benchmark different versions of this code to see what is the best.

    By the way, I'm still in doubt if lock established by "flock LOG, 1" will be removed by rename() operation. And this is very important to know. When i wrote Version#2, I supposed that rename() operation uses system functionality to physically overwrite LOG with TEMP (i.e. doesn't interfere with flock). And, since LOG is opened as read-only, and flock function is virtual and affects only cooperating scripts, there is probably a chance that involved scripts will continue to obey this LOCK until it's unlocked by close() operation, even if the file was physically overwritten.
    Are these assumptions mistaken?
      The post above is my post. The session was just expired and I was recognized as "Anonymous Monk" :)

      A little update: I have unmodified Version#2 running for several hours under heavy load (10-15 scripts at once) and there are still no file corruptions. But probably it's just a luck. My question about interference between flock() and rename() is still open...
        So, I'm going to experiment little bit and benchmark different versions of this code to see what is the best.

        If performance matters, this is always a good idea!

        For what you want to do, getting a "read lock" on LOG, basically means nothing. You need an exclusive lock. There is no need to get any kind of lock on the temp file - should be a unique file anyway. I mean if it is a unique file, for your own access, nobody else is going to mess with it.

        You haven't explained much (actually nothing) about what LOG does in terms of IPC except that this file is used for IPC (Inter Process Communication).

        There is a difference between "guaranteed to work all of the time" and "very high probability of working".

        My question about interference between flock() and rename() is still open.

        If the file is closed, the lock is released. You cannot have a lock unless the file is open. You cannot rename x=>y unless y doesn't exist. If your process relies upon a "write" lock on y, this won't work (all of the time) because you have to delete "y" before re-naming x=>y. If your OS allows x to replace an existing file y, then I'd like to see a Perl example.

        rename as like all file operations, can fail -- check the return status.