File locking, lock files, and how it all sucks

tocie has asked for the wisdom of the Perl Monks concerning the following question:

(Long)

I'm managing a rather large project that uses flat files and a series of hashes written to disk (ala Data::Dumper) to store data.

Some of our clients are trying to push the system to extremes it was never intended nor designed to handle.

Most of those are doing that unintentionally (by placing the system on slow, old servers, etc).

Unfortunately for me, it's my job to figure out what to do about it. :D

Here's the situation in short.

We're using flock (use Fcntl ':flock'; flock(FILE, LOCK_EX), etc) to valiantly try to prevent the scripts from clobbering each other's data.

It isn't working. Allow me to demonstrate with one of the many situations we've encountered:

Process 1 needs to rewrite and ADD TO some of the data in FILE.
Process 2 needs to rewrite some of the data in FILE.

Process 1 opens, flocks (shared), and reads FILE.
Process 2 opens, flocks (shared), and reads FILE.

Process 1 appends the data internally.
Process 2 alters the data internally.

Process 1 opens, truncates, and flocks FILE (exclusive)
Process 2 is also able to open and truncate FILE, right as Process 1 is doing the same.

Process 2 gets blocked after the open by Process 1's flock.

Process 1 happily dumps its data into FILE.
Process 1 releases the lock & closes FILE.

Process 2 notes the lock has been released and begins rewriting FILE.
Process 2 releases the lock & closes FILE.

FILE now has the data set from process 1 overwritten by the lesser data in process 2, resulting in the file being quite corrupt.

This is bad.

I need to figure out if there is any way to establish a file lock BEFORE I open a file.

Using a lock file is not out of the question. The currently prevailing idea is to use a handful of lock files (for different functions, etc), then use flock to establish and release locks while work is being done.

At best, this is a messy, awful workaround, and is not suitable for production.

Does anyone have any wisdom to share? Is there any way to lock a file before opening it? Is there a way to use a central lock file without it making the gods cringe?

I thank you in advance for any knowledge you may wish to share, and swear that I will do my best to help others if I have the knowledge.

Comment on File locking, lock files, and how it all sucks

Replies are listed 'Best First'.
Re (tilly) 1: File locking, lock files, and how it all sucks by tilly (Archbishop) on Aug 22, 2001 at 00:00 UTC
My experience is that while locking is conceptually simple, virtually nobody ever gets it right. Read the thread starting at RE: RE: Flock Subroutine for a description of some common mistakes. The general theme is that you should lock overall tasks, and not access operations. For instance in your example above each process should get an exclusive lock before starting to read the file, and should not lose it until they are done writing. That is the only way to avoid races. Also remember that a close loses the lock. And put in error checks, flock can fail for many hard to spot reasons. (eg On Linux trying to lock a file that is available through NFS.) Some old code of mine which does an OK job of this is at Simple Locking. It uses the sentinal lockfile approach. Oh right, and if you can you want to use databases, not flatfiles. But you probably knew that...	[reply]
Re: Re (tilly) 1: File locking, lock files, and how it all sucks by no_slogan (Deacon) on Aug 22, 2001 at 00:10 UTC
tilly++ Also remember that a close loses the lock. I'd like to add to that... you lose the lock if you close any filehandle that has the locked file open. Here's some code to make this clearer: `open A, "<foo"; flock A, LOCK_SH; open B, "<foo"; close B; # lock on foo is now lost` [download] This took me forever to track down when my DBM files started getting corrupted a while back...	[reply] [d/l]
Re: Re: Re (tilly) 1: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 22, 2001 at 00:21 UTC
Luckily we're never opening the same file twice... otherwise that would have confused the hell out of me. Thank you for the warning!	[reply]
Re: Re (tilly) 1: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 22, 2001 at 00:21 UTC
That's a great thread... Thank you! We're doing error checking and logging everywhere... it's saved me quite a bit of time and fusteration. We have two problems with the open, read, process, write, close model: require() and do() do not obey file locking all of the time (i.e. it works on Linux, but not on some other Unixes, and is flaky as all hell on WinNT/2k). About a third of our data is in files that get included using require() or do(). (The solution I've implemented does an open for read & exclusive lock, then the require/do) If people spawn this thing five times a second, each taking one second to do its work ... Well, I'll put it this way. This product is already notorious for placing a high load on the servers it's placed on (everyone blames it on CPU use, while it's really all the I/O - the thing FLIES when put on a ramdisk :) ) Could you give me some arguments that might help reinforce the reasons that the open, lock, read, process, write, close model is better? I need to assault the management with it. Thank you once again!	[reply]
Re (tilly) 3: File locking, lock files, and how it all sucks by tilly (Archbishop) on Aug 22, 2001 at 01:20 UTC
The reason why you should open, lock, read, process, write, then close is that it is the only safe approach. If you do anything else, then there is simply no way to know when you go to write whether the data you read is still valid. Now further comments. If you have performance problems, I would start to look for bottlenecks. Here are some places to look. Can you speed up what you are doing with the data from the files? For instance if you are loading a lot of data with require/do, then you may find using Storable to be much better. Is there redundant extra work you can find ways to avoid? For instance if you want to do a minor edit, you need to rewrite the whole file. With DB_File you can use the rather efficient Berkeley DB database which uses on-disk data structures that allow edits to only rewrite a small part of the file. (A tip. Look up BTREE in the documentation. For semi-random access of large data sets, a BTREE is significantly faster than hashing because it caches better.) Are there any major points of contention? For instance lots of processes may need to touch the same index file. But if you can get away with using the newer interface to Berkeley DB, BerkeleyDB, then you may be able to have them lock just the section they need, so that multiple processes can manipulate the file at once. Alternately you might split the index file out into multiple editable sections, and have a process produce the old index file through a routine merge. What does your directory structure look like? When people use flatfiles it is very easy to wind up with directories of thousands of files. However most filesystems have array-based implementations, so that results in a lot of repeated scanning of inodes to access files. This can kill performance. With access functions for your files you can turn large flat directories into nested trees which can be accessed much more efficiently. If you can put an abstraction API in front of the disk access, then you can move to a real database. This may give you huge performance benefits. (Not to mention internal sanity improvements.) OK, that should be enough ideas to keep you busy for the next 6 months... :-)	[reply]
Re: Re (tilly) 3: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 22, 2001 at 01:52 UTC
Re: File locking, lock files, and how it all sucks by VSarkiss (Monsignor) on Aug 21, 2001 at 23:44 UTC
The problem isn't in Perl. Your mutual exclusion is incorrect. (Obviously ;-) It's OK to allow programs to read the contents of a file with shared locks, but writers must take exclusive locks -- which you're doing. However, if your exclusive lock attempt fails, you need to re-read the file. If you can't take an exclusive lock, someone else is writing to the file, and your in-memory copy is no good.</p< In your snippet above, when Process 2 fails to get an exclusive lock, it can't just wait to get the lock, then clobber it anyway. It needs to start from scratch because when the lock fails, it has no idea what the state of the file is. HTH Update Forgot what I was going to suggest strongest of all: if you have the time/resources, ditch the files in favor of a relational DBMS with transaction support....	[reply]
Re: Re: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 22, 2001 at 00:08 UTC
Unfortunately, this system has to run "everywhere" on stock Perl and stock modules. No RDMS for us. :( :( :( There's no code in the system that would timeout - it's a straight open, flock, write, unflock, close, everywhere. If flocking is working as it should, the script SHOULD be waiting at the flock step until the other exclusives have released and it can get an exclusive. What APPEARS to be happening is TWO instances of the script open the file at the same time, but one gets the exclusive lock first. It will do its thing, then release the lock. The second script will then write out ITS data to the start of the file (and/or is resetting the filehandle to the start - I'm not sure if the file cursor is based on what the O/S reports or where that particular script reports), and release its lock. In other words, the flocking isn't doing the job well enough. I need to block before I even open the file. Thank you for your continued thoughts. It's VERY appreciated!	[reply]
Re: Re: Re: File locking, lock files, and how it all sucks by dga (Hermit) on Aug 22, 2001 at 01:02 UTC
Anytime your process is going to change a file you must exclusively lock, read, update, write, unlock. Only readers who will never write the data can use shared locking.	[reply]
Re: File locking, lock files, and how it all sucks by dragonchild (Archbishop) on Aug 21, 2001 at 23:35 UTC
Although I am not fully conversant with it, there seems to be a lack of a coherent structure here. You seem to be opening files WELL before you need to write them. Write a set of modules that will do all your file I/O for you. Have it do the following: Check to see if the file is locked. (Are you doing this right now??) Take a lock on the file or block until you can. Open the file read/write/append/whatever Close the file. Release the lock. I would seriously look at doing these steps EVERY time you want to do some sort of file interaction. If you're not interacting with the file at that moment, you don't have a filehandle open. It may seem wasteful, but you have greater concerns than CPU optimization, namely synchronization. ------ /me wants to be the brightest bulb in the chandelier! Vote paco for President!	[reply]
Re: Re: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 21, 2001 at 23:55 UTC
Exclusive flock will wait for previous exclusives to release before it will do its thing, i.e. it blocks. We flock on every file read & write, so yes, we are checking. I've personally rewritten each of what used to be seperate file I/O routines to use what is now only a handful - all of which use the same model: Verbatim from one of the routines: `open(FILE, ">$path") or die "$! writing $path"; &lock; # calls flock(FLOCK_EX) and logs print FILE $string; &unlock; # calls flock(FLOCK_UN) and logs close FILE;` [download] (Yes, we are using strict... *FILE has been localized. The perils of inheriting someone else's work... :( :( :( ) Further thoughts? Thanks!	[reply] [d/l]
Re: Re: Re: File locking, lock files, and how it all sucks by dws (Chancellor) on Aug 22, 2001 at 01:16 UTC
One of the common problems people have with file locking is failing to append correctly. Your verbatim example didn't suggest that you were appending, but if other snippets of your code read `open(FILE, ">>$path") or die "$! writing $path"; &lock; # calls flock(FLOCK_EX) and logs print FILE $string; &unlock; # calls flock(FLOCK_UN) and logs close FILE;` [download] then you have a race condition that can cause file corruption. Consider what happens if the `open` succeeds and the `flock` blocks to obtain the lock. The `open` has positioned you for writing to the end of the file. But if the process that holds the lock writes after you've opened the file, the end-of-file mark moves. You're now positioned to overwrite whatever the other process wrote. If your string is longer, their string gets lost. If your string is shorter, you probably corrupt the file. The way around this is to seek to end-of-file after you've obtained the lock. `open(FILE, ">>$path") or die "$! writing $path"; &lock; seek(FILE, 0, 2); # ensure positioned at EOF print FILE $string; ...` [download] Update: This also applies when truncating a file. Only truncate after you've obtained an exclusive lock. If you truncate before locking, you risk pulling the rug out from under whatever process does have the lock.	[reply] [d/l] [select]