in reply to Re: How to do atomic file locking?
in thread How to do atomic file locking?

Thanks for the advice. I am currently going over the docs.

Out of curiosity, how did you resolve the atomicity issue? I mean, did you use some other Perl technique or did you move that part out of Perl altogether and used something else?

What do you mean by flock being graceful? You mean if flock somehow crashed or misbehaved it didn't mess up things?

Thanks again!

Replies are listed 'Best First'.
Re^3: How to do atomic file locking?
by marinersk (Priest) on Sep 02, 2015 at 22:42 UTC

    I just realized I never answered your question about what I meant by "graceful".

    You ask Perl to get a lock on the file, and off you go. Cleanup is a matter of closing the file.

    No funky coding, no oddball exception handling, no having to overthink the problem. Ask it to do its job, and it does -- no muss, no fuss, no thirty lines of code. That, to me, is the epitome of grace.

Re^3: How to do atomic file locking?
by marinersk (Priest) on Aug 31, 2015 at 21:03 UTC

    If memory serves, I coded up a primitive two-stage, file-based, self-help locking mechanism. Something akin to what follows. In this example, for file test.dat:

    1. Create a file (lock request token) based on the original filename
      • Include its creation timestamp
      • Include its modification timestamp
      • Include its server and unique PID
      • Make these sortable by creation, server, pid, modification
    2. Loop until it is the oldest lock request for this file
      • Re-create your lock request token file if it's been removed.
      • Update your lock request token modification timestamp
      • Delete requests older than the lock request timeout
      • sleep a short bit (don't burn the CPU)
    3. Create another file (lock bid token) based on the original filename
      • Include its creation timestamp
      • Include its modification timestamp
      • Include its server and unique PID
      • Make these sortable by creation, server, pid, modification
    4. Loop until it is the oldest lock bid for this file
      • Re-create your lock bid token file if it's been removed.
      • Update your lock bid token modification timestamp
      • Delete bids older than the lock bid timeout
      • sleep a short bit (don't burn the CPU)
    5. Unlocking consists of removing the lock request and lock bid files
    6. Long-running processes need to have a mechanism for updating the modification timestamp on the bid token to avoid it being deleted by others using the same self-help system.

    Then requests for lock simply queue up. This is not collision-proof, but in a low-volume environment was sufficiently collision-resistent for my needs (the math was favorable).

      Wow. I discourage anybody from trying to repeat this exercise.

      This looks like an example of something I've seen happen many times: A relatively inexperienced (with locking, anyway) person tries to solve a problem with locking by piling on complexity. The likely best solution (IME) is usually actually removing complexity. Part of why I know of these things is because I've had to step in and fix the problems that resulted when the implementation grew beyond the "toy" stage.

      I'd love to hear some actual facts or details about the purported problems with flock. The description makes me suspicious in itself because Perl's flock is just a thin wrapper around exactly one function from your operating system; either flock(2), fcntl locks, lockf (on systems I appear to never have used), or LockFileEx (on Windows). So I'd expect any discussion of problems with Perl's flock to begin with specifying which of those choices the problem applies to. If you don't start with determining that, then you are doing some stabbing in the dark.

      The primary problem I can think of with Perl's flock is that it is most often implemented via flock(2) and that doesn't work over NFS. That's one reason that I wish Perl would prefer fcntl locks over flock(2). But using fcntl locks directly from Perl is not particularly hard so I'll often do that (even when not dealing with NFS; fcntl locks are just much more powerful and also have slightly better semantics at the edges than flock(2), IMO).

      Now, it is rare that I find myself on a system where fcntl locks don't work over NFS. I find lots of complaints about that, but I've just almost never run into a system with such problems. These days, you really should just expect that your Linux supports fcntl locks over NFS unless you are using something fairly ancient.

      But even if you've got a system where fcntl locks don't work over NFS, the answer is not something complicated. Just use link. Create a file named "lock.$$.$host" and write your host name (or IP address) and process ID into it. Then call link("lock.$$.$host","lock"). If that works, then you own the lock (and you can then unlink "lock.$$.$host"). If it fails, then somebody beat you to it.

      But I almost never do that because I sometimes want to reliably detect if the process holding the lock is actually still running. But then, NFS rather sucks anyway (failure is handled badly and the "fix" for that, autofs, actually means failure is handled even worse, at least in some respects) so I prefer to just avoid it anyway.

      And if you aren't dealing with both NFS and an ancient OS, then you can also just pass O_EXCL to sysopen. That calls open(2) and that manual's section on O_EXCL even documents how to use link to get reliable locks over NFS simply (just like I described above).

      So, you can use Perl's flock, fcntl, sysopen, or link, roughly in that order and each of those is simpler, more reliable, and faster than the proposed solution I replied to.

      - tye        

        Excellent information, tye, and yes, I would not consider using the haphazard solution I cobbled together described above for anything where a better mechanism could be found.

        In the case where I did deploy this mockery, a friend had a web site where he'd used my original file locking module (which uses flock), and it had worked fine for, I seem to recall, years.

        Then suddenly it would start issuing errors -- it's been a long time, but I have a vague recollection it was indicating something about running out of file handles or something.

        Anyway, we pored over the code and added logging and could find no reason that made sense to us at the time to explain consuming file handles; the web host provider was absolutely useless in assisting on the troubleshooting, flatly refusing to share much of any information at all. They had no shell access, no help from the web host provider, and no logical explanation for the error. They were kind of boxed into a corner.

        They needed their event registration page functional, and we were running out of time. So, we hastily threw some quick workaround ideas onto the table, decided on the monstrosity above, and coded it. It got them through the registration window, and then they switched web hosting providers, reverted back to using the module employing flock, and, as far as I know, they've had no further problems with it.

        I think he'd later determined that they'd changed the underlying OS without telling any of their clients -- or something equally absurd. It's been too long ago; the details are gone. The monstrosity above got us through in a pinch.

        Update: To answer your other point, I seem to recall $^Owas reporting an odd flavor of *nix whilst the problem was afoot. And for some reason BSD or NetBSD is ringing a bell. But I just don't have the details anymore.

        Update 2: Another point; I've scanned my code repository going back to 2003. My last file-level locking code was in 2008, and it used the old module with flock. I don't see anywhere that I've used the beast I described above except in that one emergency. Thus, my original statement that I found something else and didn't have to go back appears to be me being misled by faulty memory; all serialization of activities since 2008 then have been done using database queueing systems.

        It's weird, not having precise and seemingly infinite recall anymore. I've benefitted from that trait my whole life; it would seem I'm now a sitting duck for a number of old jokes about memory. :-)