in reply to Re (tilly) 2: flock - lordy me....
in thread flock - lordy me....

This is mostly a question. My impression is that flushing your buffers doesn't mean that the data has been written to the storage medium. But it does mean that the data is visible to other processes. So if another process goes to run after you've flushed your buffers, that data may not be on the storage medium but the other process should end up reading your data from the OS cache.

So you can still have a problem if, for example, the power fails after you get your sequence number. But delaying the unlocking doesn't help that case anyway.

I just noticed that merlyn made a good point about the danger of your read-cache being stale, but I don't think that applies in this particular case.

So my impression is that the explicit unlock isn't actually a problem in this specific case. I agree that it is a good thing to avoid as a general principle.

So my question is "Am I missing something?" I don't want to detract from the good points made. But I'm hoping for validation or specific refutation of my understanding here.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
(tilly) : Can't find it :-(
by tilly (Archbishop) on Jan 17, 2001 at 00:12 UTC
    Really irritating, I know I saw it listed as a bug, but I cannot find reference to it now. In general on a properly working Unix system, yes. Writes go to a buffer that gets flushed to the kernel whenever. From the kernel it gets flushed to disk whenever. (You don't want your program waiting on your disk drive. Really. Unless you are a database, know exactly what you are doing, and are looking for solid transactional semantics...)

    If you manually flush the buffer (see select for how to do this) then in theory that goes to the kernel (or over NFS goes to the server) and you don't return from that until after the change is globally visible. So flushing your buffer and then unlocking should be safe. Indeed page 246 of the Perl Cookbook claims that as of Perl 5.004 there is an automatic flush when you unlock a filehandle. perldoc -f flock agrees with that. (So the importance of a manual flush is more important to know for when you are working in other languages.)

    But I do have a memory wandering around that I cannot track down right now saying that theory != practice in some situations on some operating systems. (What, you mean that some OSs have bugs??? Yup!) I don't have a reference though. :-(

    However if you scan for "dual-buffer" in this summary of what is new in Linux 2.4 you will find that there used to be separate read and write buffers in the Linux kernel. I know that led to bugs, I don't know if any of those are applicable here. (If you also look for mentions of "raw device" you will find that raw I/O devices are mentioned. If you have a database on Linux and it is not on a raw I/O device, then it cannot give perfectly protected transactional support as I mentioned above.)

    There is a bug I have encountered on Windows that may relate but which just relying on close is not enough. I found when working in a directory mounted remotely through SMB that if I wrote and closed a file, then *immediately* run a system command that accessed that file, occasionally my write was not visible. If I put in a delay then it never failed.

    There may be all sorts of system specifics needed to hit it, and even where it happened it was only occasional. The fact that these bugs depend on the install was driven home for me by another one I reported then eventually just worked around. rename on NT will set case if the filesystem understands it. But if the effect only changes the case of an existing file, then depending on your exact dlls, the kernel may delete the file instead. This is only seen on some installs of NT. Specifically it did not happen in testing or on any machine for a year until we got some new desktops in and then lost a bunch of files...

    But back to your question. I am fairly sure that I have seen a bug report where it really was better to just close the file, don't try to flush and unlock first. But in theory there is no reason why that could ever be true. And I cannot find the example right now.

    ObRandomTrivia: flock may not work on all filesystems even if it is supported in the OS. I know that the Linux folks in particular don't support it, and a lot of databases (eg Oracle, Berkeley DB) say and really mean that they are not to be run off of NFS partitions.

      I guess all bets are off in the case of kernel bugs. But I'd leave the manual flush in since the kernel might implement close() incorrectly and do the lock freeing before it does the buffer flushing.

      I would hope that flock would return a failure indication if you used it on a filesystem where it wasn't supported. Not that anyone has been checking the return code in their examples so far...

              - tye (but my friends call me "Tye")
        Why do we always seem to disagree? :-)

        Yes, all bets are off in the case of a kernel/filesystem bug. But I am unconvinced that it is better to make it manual. I prefer the simpler code on general principles. But I would also isolate the calls into an atomic interface so that if something did go wrong I could fix more easily. Besides which, then I am in a better position to move to fcntl or some other kind of locking if I need it at some point.

        BTW I already did isolate locking somewhere else. And now you cannot complain that nobody has posted examples where the return code of flock is checked. :-)

        UPDATE
        Thanks tye for pointing out that one not was not wanted. That noted not was not for naught. :-)