File Locking revisited

Jasper has asked for the wisdom of the Perl Monks concerning the following question:

I've had a few problems lately with file locking. Admittedly my first solution where I was having the problems was shoddy, to say the least.

Let's say I have a system where more than one process will try to grab a pair of files (two associated files), read it/them, copy it/them elsewhere, and delete the originals. I want only one copy of the originals to be floating around. The initial solution was

get handles,
lock,
copy,
unlock,
unlink.

My processes were both able to get file handles, then if the first released its lock before the second tried to get one, comedy ensued. s/comedy/trouble/.

I thought I might have fixed that by unlinking before releasing the locks (might work?), but I was still getting some duplicates, because a filehandle still points to the file data even if it's been unlinked, and that handle can still get a lock on the filehandle (naivety made me think this might work, I suppose).

The newest fix, which seems to be working (touch wood) only because the file is actually two files, is to get a lock on one, then get a filehandle to the other, and then get a lock on it. If any of these fails, then give up. I think this is only working because one of the two files is essentially behaving like a lock file for the other.

My question is really this: is there a way to do this with only one file, not using a lock file?

I read the file locking tutorial 7058 (and the associated replies) by turnstep, but it still leaves some threads hanging. Also, I suppose this isn't really a perl question, but I'm guessing there might be some people out there who have done similar...

Comment on File Locking revisited

Replies are listed 'Best First'.
Re: File Locking revisited by TedPride (Priest) on Dec 01, 2004 at 13:57 UTC
What you need is a THIRD file. You should never lock the file(s) you're processing, as this leads to all sorts of problems in most situations. Lock the lock file Check the files. If they've been copied already, do nothing. If not, copy them. Unlock the lock file Simple as that. The key is to eliminate simultaneous processing of files, and by separating the lock file from the files to be processed, you can achieve that easily.	[reply]
Re^2: File Locking revisited by Jasper (Chaplain) on Dec 01, 2004 at 14:27 UTC
What you need is a THIRD file. You should never lock the file(s) you're processing, as this leads to all sorts of problems in most situations. I was worried someone would say that, although, having thought about it, I think the 2 file consecutive locking is (nearly) as good (as far as my processing goes). Check the files. Easier said than done, I think, but I'm sure something will be available. Thanks for the wisdom.	[reply]
Re^3: File Locking revisited by revdiablo (Prior) on Dec 01, 2004 at 17:27 UTC
I was worried someone would say that What's to worry about? If you're concerned about having to keep an extra file around, consider locking the script itself. This may not be appropriate in all cases, but it's a fairly common idiom: `use Fcntl qw(:flock); # lock myself open my $lockfh, "<", $0 or die "Cannot lock myself: $!\n"; flock $lockfh, LOCK_EX; # ... close $lockfh;` [download]	[reply] [d/l]
Re^4: File Locking revisited by Jasper (Chaplain) on Dec 03, 2004 at 16:04 UTC
Re^3: File Locking revisited by graff (Chancellor) on Dec 03, 2004 at 14:14 UTC
You might try the code I posted a while back on this node -- it's a simple module that implements a nice semaphore file locking technique that I pulled out of a TPJ article (code provides url to the article, written by Sean Burke). Regarding this part of the OP: Let's say I have a system where more than one process will try to grab a pair of files (two associated files), read it/them, copy it/them elsewhere, and delete the originals. I want only one copy of the originals to be floating around. The initial solution was get handles, lock, copy, unlock, unlink. If I get what you're describing, multiple processes can be trying to access either of two files at any time, and will normally want to "open / read / close / make a copy elsewhere / unlink the original" on each file in succession. With a semaphore file, it would look like this: `get lock on semaphore file for (file1, file2) { open read and copy close unlink } release semaphore file lock` [download] So long as all the competing processes are set to use the same semaphore file, this will assure that only one process at a time can do anything at all with the two target data files.	[reply] [d/l]
Re: File Locking revisited by bluto (Curate) on Dec 01, 2004 at 16:26 UTC
I second the suggestion of using a third file. As with any type of locking it is very hard or impossible to lock an object and then also synchronize the move/removal of the same object -- there are almost always hidden catches (i.e. race conditions) to doing this. Sometimes you can code around them, but it tends to be much cleaner and simpler to give up and use a separate, higher level lock. That said, sometimes I avoid locking if I have the luxury of moving the file to a private directory that no other process will use in the same file system. In this case, with a single file, the move tends to be atomic. Since you are using two files, there are probably other race conditions you have to worry about here so YMMV.	[reply]
Re: File Locking revisited by paulbort (Hermit) on Dec 01, 2004 at 21:09 UTC
Other Monks have already suggested using a third file, and I agree that what is missing here is a resource, but I would be inclined to not use file locking at all. If you consider this a serialization problem instead of a contention problem, a different solution presents itself: Funnel all requests for the files to be copied to a second program, which can deal with the requests in turn. Maybe this funnel is done by appending to a text file (Which the OS will usually keep atomic), or writing records to a queue table in a database, or using a socket. Then all the second program does is watch for a signal to do the copy, and act on that. Since it is the only one doing the copy, no contention issue. If you want to discard some copy requests based on other criteria (can't copy within one minute of last copy, etc.) that becomes easy. This also has other advantages, in that it can handle copy requests from programs that don't use Perl's locking semantics, can log all of the copy requests, etc. -- Spring: Forces, Coiled Again!	[reply]