ric.techow has asked for the wisdom of the Perl Monks concerning the following question:

I have an existing system were a main pgm executes a number of instances of a perl pgms and then suspends it self with a kill -stop $$. The submitted pgm write their pid to a common file and then when finished remove their pid. The very last instance to finish deletes the common file and issues a a kill -cont for the main pgm.

The problem is that even though all the sub'ed pgms finish their are still pids left in the common file. Hence the last instance doesn't detect it is the last and doesn't issue a kill -cont.

I can demonstrate the code doesn't work by cutting it down to essentials but I am not clear as to why it doesn't work. Multiple instances could open the common file at the same time but their is a flock which should sreialize the writes to the file.

Can this happen - instance 1 & 2 opens the file. Instance flocks first and rewrites with its pid removed. When instance 1 finishes instance 2 flocks the file. However the state of the file that instance 1 sees still has instance 1's pid in it. So instance 2 rewrites the file with instance1 pid and puts it back after it having just been removed.

Is this plausible? A code snippet below.

Thanks for any help you can give me.

open(PID, "> $CONF{upd_lockfile}") ; # Print all PIDs except this one. # flock(PID, 2); seek(PID, 0, 2); for my $a (@pids) { print(PID "$a\n") if ($a != $$); } flock(PID, 8); close(PID);

Replies are listed 'Best First'.
Re: File lockingProblem
by ikegami (Patriarch) on Dec 04, 2011 at 22:48 UTC

    OP's code:

    open(PID, "> $CONF{upd_lockfile}") ; flock(PID, 2); seek(PID, 0, 2); for my $a (@pids) { print(PID "$a\n") if ($a != $$); } flock(PID, 8);

    Problem #1: You don't prevent anyone from modifying the file between the time you read the PIDs and the time you write them.

    Problem #2: You release the lock before writing anything to the file. Closing the file would flush its buffers, but you didn't do that. In fact, if you closed the file, you wouldn't even have to unlock it since it's already unlocked.

    use Fcntl qw( LOCK_EX ); sub remove_pid { open(my $PID, '+>>', $CONF{upd_lockfile}") or die $!; flock($PID, LOCK_EX) or die $!; seek($PID, 0, SEEK_SET) or die $!; chomp( my @pids = <$PID> ); @pids = grep $_ != $$, @pids; seek($PID, 0, SEEK_SET) or die $!; print($PID "$_\n") for @pids; truncate($PID, tell($PID)) or die $!; kill(CONT => $parend_pid) or die $! if !@pids; close($PID) or die $!; }

    But why go to all this trouble for a solution that's inherently fragile? (e.g. Consider what happens if a child gets killed.) Why not have the parent use wait or waitpid instead of suspending itself.

Re: File lockingProblem
by locked_user sundialsvc4 (Abbot) on Dec 05, 2011 at 16:14 UTC

    The algorithm that you describe seems very fragile to me.   “It smells really funny.   In fact, it smells really bad.”     Is there any practical way to redesign this aspect of the system?   I would suggest looking around very hard for any other options.   flock() is a dependency that is often thought of as being very problematic because of considerable per-filesystem dependencies in its implementation especially when (especially, Unix) networked filesystems are involved.   Perhaps this is a situation where you might successfully make the business case that, “we are just down in the wrong rabbit-hole here ...” such that a shift in the architecture of this one key aspect of the system might make a lot of headaches go-away for good, and the time spent dinking around with it until then is simply an ongoing sunk-cost.   Maybe what you’ve got right now can’t be made truly reliable and satisfactory.   Worth a thought ...