cLive ;-) has asked for the wisdom of the Perl Monks concerning the following question:

Hmmm, this one puzzles me. I want to read in a file and delete it, but the file randomly get's appended to and I don't want to lose any appended stuff accidentally.

What I was thinking was:

open(FILE,$file) || die $!; flock(FILE,2); my $file_content = join '', (<FILE>); flock(FILE,8); unlink $file;
But, the flock releases the file before it's deleted! Is there a chance that data could be appended between the flock(FILE,8) and the unlink?

Is there a better way to do this?

Thoughts?

cLive ;-)

Replies are listed 'Best First'.
Re: correct usage of flock?
by wog (Curate) on Jun 06, 2001 at 04:27 UTC
    A potential problem here is that the file still exsits after you delete it, at least on UNIX-like systems. This means another process could open the file, look for a lock, not get one, and when it gets a lock you've unlink'd the file. But UNIX still lets the program write to it, though the data goes nowhere. That's not good, now is it.

    One idea you could try is to rename the file first, which would allow programs still having the file open to probably have their data go somewhere, and then look at the renamed file.

      I think this is a good way to go about it. However, if it's likely that separate invocations of this program will occur, a random filename would be a Good Thing.

      There's also the tried and true trick of writing a sentinal file that other programs check for and then seeking to the beginning of the original file and truncating it, but renaming seems much easier.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: correct usage of flock?
by Vynce (Friar) on Jun 06, 2001 at 04:31 UTC

    yes, it's very possible. you might want to consider other methods. i have one idea, but it assumes two things (though it might be adaptable to other environments):

    1. unix-like system
    2. you know where those appends are coming from
    my suggested alternate plan is this:
    1. move the file (`mv flat.file flat.file.tmp`)
    2. HUP any process that might still have the file open for writing, or wait until thy die themselves (see below)
    3. read and delete the temporary version
    as i said, it relies on you knowing what might write to that file. if those programs open and write and re-close the file, then you can just wait the maimum time it might make one of them to finish. if they keep it open, however, you'll want to HUP them or otherwise force a close-reopen cycle. this sounds extreme, i know, but it's safe in more circumstances than you might think, for example, apache servers.

    the reason this works at all is tha tif you move a file that another process already has open, the other process won't notice, and ill keep writing to it without a problem.

    the problem, really, is that unix doesn't provide you with any way to delete from teh front of a file. maybe some day there will be an OS without that problem.

    .
Re: correct usage of flock?
by dws (Chancellor) on Jun 06, 2001 at 04:35 UTC
    I don't have a correct answer in hand, though if I were faced with this problem, I'd look at what server log rotation tools do. They face a similar problem.

Re: correct usage of flock?
by sierrathedog04 (Hermit) on Jun 06, 2001 at 06:22 UTC
    flock(FILE,2);
    is an exclusive lock. That means that you can write or delete the file but others who use flock cannot. Therefore the correct approach is to make
    flock(FILE,8)
    the last line of your program. Then you are protected from race conditions.

    I tried it out on my Linux box, and it works great. You can unlink a file while you have an exclusive lock on it, and indeed that is the best way to prevent anyone from writing to it before you are finished deleting it.

      ?? So you're saying I can unlink that file whilst the FILE ilehandle is still open? Weird. I'll give it a go.

      cLive ;-)

        FWIW, this won't protect you from the case where two processes open the file at about the same time. The unlinked file won't be openable by other processes, since the namespace entry will be gone, but the two processes which already have it opened can continue to access it.

        The first process will do it's thing and then delete the file. The second process will grab the lock on the still opened file, and then process as if it had the lock -- it won't know the file has been unlinked. If you can live with this, that's fine. If you want "exactly once" semantics, it's not.

        FWIW, in general it's not a good idea to try to use an object's lock to synchronize the destruction of that object -- it is easy to overlook a race condition. You might get away with this in certain cases, but in general you'll rest easier if you use a different way. See one of the other methods described above (e.g. Ovid's 'sentinel', file renaming, etc.)

Re: correct usage of flock?
by Anonymous Monk on Jun 06, 2001 at 19:08 UTC
    Get a flock on another file and then open+delete. The Server of course must get the flock on this other file too to open+append.
Re: correct usage of flock?
by DBX (Pilgrim) on Jun 06, 2001 at 05:54 UTC
    Are the other programs that write to this file using flock also? flock is useless unless everything that writes to it uses it.
      yes, the append script uses flock, and is on same server. So I could check whether it's locked, and move it if it isn't? Seams like the best solution to me.

      cLive ;-)