in reply to File locking, lock files, and how it all sucks

My experience is that while locking is conceptually simple, virtually nobody ever gets it right. Read the thread starting at RE: RE: Flock Subroutine for a description of some common mistakes. The general theme is that you should lock overall tasks, and not access operations. For instance in your example above each process should get an exclusive lock before starting to read the file, and should not lose it until they are done writing. That is the only way to avoid races. Also remember that a close loses the lock. And put in error checks, flock can fail for many hard to spot reasons. (eg On Linux trying to lock a file that is available through NFS.)

Some old code of mine which does an OK job of this is at Simple Locking. It uses the sentinal lockfile approach.

Oh right, and if you can you want to use databases, not flatfiles. But you probably knew that...

  • Comment on Re (tilly) 1: File locking, lock files, and how it all sucks

Replies are listed 'Best First'.
Re: Re (tilly) 1: File locking, lock files, and how it all sucks
by no_slogan (Deacon) on Aug 22, 2001 at 00:10 UTC
    tilly++
    Also remember that a close loses the lock.
    I'd like to add to that... you lose the lock if you close any filehandle that has the locked file open. Here's some code to make this clearer:
    open A, "<foo"; flock A, LOCK_SH; open B, "<foo"; close B; # lock on foo is now lost
    This took me forever to track down when my DBM files started getting corrupted a while back...
      Luckily we're never opening the same file twice... otherwise that would have confused the hell out of me. Thank you for the warning!
Re: Re (tilly) 1: File locking, lock files, and how it all sucks
by tocie (Novice) on Aug 22, 2001 at 00:21 UTC

    That's a great thread... Thank you!

    We're doing error checking and logging everywhere... it's saved me quite a bit of time and fusteration.

    We have two problems with the open, read, process, write, close model:

    1. require() and do() do not obey file locking all of the time (i.e. it works on Linux, but not on some other Unixes, and is flaky as all hell on WinNT/2k). About a third of our data is in files that get included using require() or do(). (The solution I've implemented does an open for read & exclusive lock, then the require/do)
    2. If people spawn this thing five times a second, each taking one second to do its work ... Well, I'll put it this way. This product is already notorious for placing a high load on the servers it's placed on (everyone blames it on CPU use, while it's really all the I/O - the thing FLIES when put on a ramdisk :) )

    Could you give me some arguments that might help reinforce the reasons that the open, lock, read, process, write, close model is better? I need to assault the management with it.

    Thank you once again!

      The reason why you should open, lock, read, process, write, then close is that it is the only safe approach. If you do anything else, then there is simply no way to know when you go to write whether the data you read is still valid.

      Now further comments.

      If you have performance problems, I would start to look for bottlenecks. Here are some places to look.

      1. Can you speed up what you are doing with the data from the files? For instance if you are loading a lot of data with require/do, then you may find using Storable to be much better.
      2. Is there redundant extra work you can find ways to avoid? For instance if you want to do a minor edit, you need to rewrite the whole file. With DB_File you can use the rather efficient Berkeley DB database which uses on-disk data structures that allow edits to only rewrite a small part of the file. (A tip. Look up BTREE in the documentation. For semi-random access of large data sets, a BTREE is significantly faster than hashing because it caches better.)
      3. Are there any major points of contention? For instance lots of processes may need to touch the same index file. But if you can get away with using the newer interface to Berkeley DB, BerkeleyDB, then you may be able to have them lock just the section they need, so that multiple processes can manipulate the file at once. Alternately you might split the index file out into multiple editable sections, and have a process produce the old index file through a routine merge.
      4. What does your directory structure look like? When people use flatfiles it is very easy to wind up with directories of thousands of files. However most filesystems have array-based implementations, so that results in a lot of repeated scanning of inodes to access files. This can kill performance. With access functions for your files you can turn large flat directories into nested trees which can be accessed much more efficiently.
      5. If you can put an abstraction API in front of the disk access, then you can move to a real database. This may give you huge performance benefits. (Not to mention internal sanity improvements.)
      OK, that should be enough ideas to keep you busy for the next 6 months... :-)

        Unfortunately, as I noted above, we have to be able to run "everywhere". This excludes using Storable (Gods, I'd *KILL* to use Storable, it's made my life MUCH easier in other projects) and any sort of DBM file, as we can not rely on things that must be compiled on the system, may not be versions we expect, or may otherwise be out of date. (Meaning that we've ended up rolling our own mutations of common modules such as MIME::Lite and CGI just to avoid having to rely on preinstalled copies).

        That throws 1, 2, and 3 right out the window. :( :( :(

        As for 4... The previous permutation of this script ocassionally had to read a couple hundred (or thousand) files at once... I've fixed that now. Unfortunately, eliminating that problem only made the other problems that had been lurking in the background come out. (You surely know the story - you have three bugs, so fixing one makes the other two show up even more... :) )

        5 is already in progress... not that it would ever be an official release. Management is sending mixed signals. (Thankfully, the product has a healthy community of code hackers who are continuously adding on and altering things... they'll figure out what I did sooner or later! :) )

        Thank you for the input.. this is VERY valuable stuff!