Zarathustra has asked for the wisdom of the Perl Monks concerning the following question:


Greetings!

This is my first post, hope I don't do or say anything
*too* stupid...  <grin>

Anyhow, I'm currently coding up a little something that
writes to mail spools. It is very likely ( in fact almost
certain ) that there will be plenty of concurrent attempts
to write to these spools - so I'm using flock.  Also, I'm
syslogging the whole process and want to know when a lock
is being awaited, and I also want to make a time-out period
for waits on a lock - therefore I'm using the alarm call
and the $SIG{ALRM} trick to break out should a lock wait
take too long.  .... phew ....

SO - what I'm wondering is whether I'm getting *way* too
pedantic, over-cautious and paranoid in my attempts to
safe gaurd against data corruption and race conditions.

Also, I'd like to know *just* how possible it is for
a race condition to occur between two consecutive 
statements.  A snippet of my current code follows
shortly - I'd very much appreciate any and all advice 
concerning my logic - but first, just a couple of
scattered questions...

eval {
  $SIG{'ALRM'} = sub { die "timed out!" };
  alarm = 10;
  flock(FH, LOCK_EX) or die "couldn't flock: $!\n";
  # is there a race condition between these two points??!
  alarm = 0;
};
die "$@" if $@;

What if the timeout occurs *just after* the flock is
successful and *just before* the alarm gets disabled? 
Is this even remotely possible?  If so, then the reason 
for die'ing in the first place ( timeout grabbing the lock )
could possibly be in error... we *did* in fact grab the 
lock  before 10 seconds, but just barely, however the ALRM 
sig still occurred before we were able to disable it ).
Now if such a race between two consecutive statements does 
in fact exist, would the following buy any more certainty?

eval {
  $SIG{'ALRM'} = sub { die "timed out!" };
  alarm = 10;
  ( flock(FH, LOCK_EX) and alarm = 0 ) 
    or die "couldn't flock: $!\n";
};


At any rate, at this time I'm going to apologize for this
behemoth post... 

The following is a snippet of my code, commented to
( hopefully ) make my cluelessness and misunderstanding
completely obvious to all, please post your suggestions -
I can really use the advice... thanks!

( also, why do I need the newline in the die for the 
$SIG{ALRM} sub??? and is that seek() even doing anything?
I'm trying to be certain that the file is in the exact
same condition as it was when I first opened it )

#!/usr/bin/perl -w

use Fcntl qw( :DEFAULT :flock );

$splfile = '/tmp/file.txt';

# open the backup mail spool
open(SPOOL, ">>$splfile")
   or print "couldnt open: $!\n";

# try and sieze lock  
unless ( flock(SPOOL, LOCK_EX | LOCK_NB) ) {
   # failed initial attempt
   print "$splfile busy\n";

   # now try again until successful or
   # 10 second alarm causes sigtrap
   eval {
      # define signal alarm
      local $SIG{'ALRM'} = sub { die "alarm\n" };

      print "waiting for lock\n";

      alarm 10; # timeout in ten seconds

      eval {
         # wait for a green light
         flock(SPOOL, LOCK_EX) or
            die "could not lock $splfile: $!\n";
      };
      # ! potential race !  
      # *after* successfull lock, but
      # *before* alarm unsets
      alarm 0; # unset timer

      if ( $@ ) {
         if ( $@ eq "alarm\n" ) {
            # SIG{ALRM} trapped in inner eval
            die "timed out waiting for lock!\n";
         } else {
            # died in inner eval, flock complained
            die "$@";
         }
      }

   };

   if ( $@ ) {
      # race condition safegaurd
      if ( $@ eq "alarm" ) { 
         # now unset the alarm
         alarm 0;
      } else {
         die "$@\n";
      }
   }

}

print "locked $splfile\n";

# am I totally on the wrong track with
# this seek()?  what if some nit-wit
# is editing the file by hand while
# we're operating on it?
seek(SPOOL, 0, 2) 
   or die "contention for $splfile detected: $!\n";

print "appending to mailbox\n";

print SPOOL "BAR!!\n\n";

close SPOOL;

Replies are listed 'Best First'.
Re: race condition? - flock and $SIG{ALRM}
by JanneVee (Friar) on Sep 17, 2000 at 14:22 UTC
    If avoiding data corruptions is that important why not use a transaction based stuff? i.e. Write different files for the data and after you have received the finished data try to commit them to the main-file with its own script.

    But it is just an idea!

    JanneVee

RE (tilly) 1: race condition? - flock and $SIG{ALRM}
by tilly (Archbishop) on Sep 17, 2000 at 20:44 UTC
    To your first question, "Yes but don't worry about that race condition."

    If you get the lock and then die, has harm been done? Of far greater concern is that Perl's signal handler is not very reliable. If it matters to you then you really do not want to use it.

    Secondly JanneVee is absolutely right that this is the kind of problem that transactions are meant for. If you can just shove the problem to a database that has them. If you cannot then make sure you work in units where you aquire the lock, do the work, and then commit it.

    Rules on transactional stuff that will save a lot of grief for you: Until the moment you commit, do not touch any original data. If committing takes time the first step is to back up the original data, move in changed data then delete backup. Under no circumstances at any point assume you will be successful. Finally look around and you will find a number of useful snippets for locking. For instance Simple Locking or you can find more from Super Search.

(atl: flocks and processes) RE: race condition? - flock and $SIG{ALRM}
by atl (Pilgrim) on Sep 17, 2000 at 21:26 UTC
    Let´s analyze what happens, if you get an flock and die before you can unset the alarm.

    • You got the lock, so no other process will write. Fine.
    • You die before unsetting the alarm. Since this is the first thing you do, that´s equivalent to a timeout before. Fine.
    • What happens to the flock if your process dies? Since it is bound to an open filehandle, which get´s closed among process termination by the OS or even earlier by the perl interpreter (cleanup), it will simply vanish. The next process may claim the flock. Fine.
    So far, you´re code´s fine. As for the reliabilty of signals ... there is some coverage in the perldocs and here in the monastry. I don´t have a link handy, you´ll have to try a search.

    Have fun ...

    Andreas
    five days ´til YAPC::Europe ...

    Update: BTW, using a nonblocking flock and putting the process to sleep for a random time would eliminate the need for signals. Try something like three tries with varying random sleeping times, if unsuccessful, give up (i.e. die). Adjust number of tries and sleeping time for your needs.

RE: race condition? - flock and $SIG{ALRM}
by Zarathustra (Beadle) on Sep 17, 2000 at 23:17 UTC

    Thanks to the recent input I've recieved - much appreciated.

    I hope replying directly to my own node isn't considered too horribly faux pas - I'd like to clarify some things...

    The transaction gig will not be a possible solution to this particular project however, so I'm stuck with doing the best I can protecting a single file that will be appended to very often by multiple processes - these processes will all be separate instances of the same program. This program, called rtmail, is a mail filter that gets piped to from sendmail.

    What rtmail does is route mail to final destinations, depending upon the sender's domain. Thats easy - it's the bit where rtmail also makes a backup of all mail sent to it on the local machine using different spool files, named accordingly to each of the final remote destinations, that is the tricky part... This is mission critical stuff, and so I want to be as certain as possible that it's all being reliably and accurately backed off into these local mail spools.

    One thing I didn't mention in my post, is that most of those die()'s are actually replaced with a call to an exception handler I wrote that makes certain absolutely no mail, under any circumstance is ever to simply disappear into the void.

    So, what I've learned so far from your replies is that:

    Perl signal traps are unreliable I really need the timeout though, so if something goes terribly wrong, I don't have a seething mass of rtmails filling the process table on the mail server, all waiting indefinately for a lock...

    The race condition between the successful flock and the alarm unset is possible, but not to be worried about If that's the case, then I could simply bail on the inner eval and not ever bother with the extra checks on $@ . And when/if the race condition ever does occur, so what - it'll just get caught by my exception handler as if the flock timed out, so no actual damage or lost data will take place anyhow. But I would prefer, if possible, to not raise the exception routine if it's not truly an exception situation - perhaps I'm just being too retentive...

    Thanks for everyones insight, I really appreciate the help. Beers!

      Thus spoke Zarathustra:
      One thing I didn't mention in my post, is that most of those die()'s are actually replaced with a call to an exception handler I wrote ...

      Just a short note: if you catch the exception instead of letting the process die, you´ll have to close the file yourself to get rid of the lock. Otherwise you might block other processes in the (rather rare) case the race condition occurs.

      As for using an exception or not ... if you need this file lock to continue normally, not getting it (including timeout) should be considered an exceptional condition. But that´s a matter of programming style, certainly no "rigth" or "wrong" way ...

      Beers! Yup! :-))

      Andreas

        if you catch the exception instead of letting the process die, you´ll have to close the file yourself to get rid of the lock. Otherwise you might block other processes in the (rather rare) case the race condition occurs.

        Hey, thanks - you're absolutely right. That's one bit I had completely over looked...

        Good eye!

      Ah well, I'm new here - guess it shows. (c8=