in reply to NFS File Locking

If you unlink after opening but before before processing that would reduce the race window. On a local filesystem that fully removes the file from the directory listing, but as I understand it, over NFS it does a rename to ".nfsXXXX" rather than an unlink. Other processes would ignore such file names. If it were me, I'd experiment with manually doing the rename just to make sure I was certain about the rename pattern. You may also need to periodically clean up or process old .nfsXXX files in case something crashes during processing.

Update: ... or just flat-out rename before opening (to .processingXXXX-ORIGINALNAME) using the rename as a lock operation?

Good Day,
    Dean

Replies are listed 'Best First'.
Re^2: NFS File Locking
by Corion (Patriarch) on Apr 09, 2026 at 11:57 UTC
Re^2: NFS File Locking
by jbw8387 (Novice) on Apr 09, 2026 at 21:50 UTC

    Thanks Dean. I think this is the problem. If a file is deleted on NFS when there are still open file handles, it is temporarily renamed .nfsXXX and the file handles remain valid. This can create exactly the race I was observing. Machine A holds a lock. Machine B opens a file handle but before it can request a lock Machine A deletes the file and releases the lock. Now Machine B gets the lock on the renamed file. Explicitly renaming the file doesn't help here. I think the work around is just to check that the file still exists (using the original name) after acquiring the lock. If not, just close the file and assume the lock failed.

    I didn't mention the reason for doing this in the original question. But the goal is to parallelize work across many machines. Each file represents some work and the locking makes sure that two machines are not working on the same thing. The problem is that the machines are also used for other things and often are shut down are crash in the middle of doing work. Using "flock" is nice because the NFS locks are automatically released if a machine is reset or crashes. This allows another machine to come along later, pick up the work and do it again. The file is not deleted or renamed until the work is confirmed complete. Using "rename" or "mkdir" for locking leaves the lock orphaned if the machine holding the lock crashes. Doing the same work twice is not a problem as long as it doesn't happen too often. Failing to complete work because the machine doing it crashed in the middle is the problem I want to avoid.