Re: Race condition in my cron daemon

Replies are listed 'Best First'.
Re^2: Race condition in my cron daemon by clinton (Priest) on Mar 20, 2006 at 18:28 UTC
Perrin - thanks - I'll give the InactiveDestroy a go. I didn't know about that. But I have a feeling that that isn't the problem, because the same connection ID is used to obtain the lock and to update the jobs table (both of which happen in the parent process) - and this I can see in the database log. The REPLACE statement finally runs when I shut down the server, so something somewhere is hanging onto that lock. As far as reusing the connection in the child, I specifically clear out the DBI connection cache and request a new connection in the child. Again, in the logs, I can see that the parent and child are using different, new, connections.	[reply]
Re^3: Race condition in my cron daemon by tilly (Archbishop) on Mar 21, 2006 at 06:22 UTC
I strongly suggest following perrin's advice, and not discounting it unless it doesn't work. The likely problem that perrin noted is that the database server winds up being talked to by both parent and child at the same time. And the database gets confused, resulting in unpredictable behaviour. It isn't visible to you here because the race is from an implicit action that you don't see in your code. When you call MyStuff::DB::clear_cache() you remove the database handle in the child. That calls the handle's DESTROY method, which is likely to do cleanup, including telling the database, "I'm all done here." If at the same time the parent is trying to tell the database "Please do this work" the database can get all confused in a million ways. For instance the two messages might confuse the database into thinking that it hasn't yet received the full message to act on so it is waiting for the rest of the message, while the parent process is waiting for a response - leading to a hang. Use the InactiveDestroy parameter and the problem goes away because the child is no longer talking to the database behind the parent's back.	[reply]
Re^2: Race condition in my cron daemon by clinton (Priest) on Mar 22, 2006 at 12:33 UTC
I have added a `$dbh->{InactiveDestroy} = 1` in the child process, and so far so good. It is sporadic, so I won't know that it is fixed until it has run for a while longer, but looking good so far. Many thanks Perrin	[reply] [d/l]
Re^3: Race condition in my cron daemon by perrin (Chancellor) on Mar 22, 2006 at 20:48 UTC
Based on your description, it may be something else. If that doesn't fix it, let us know.	[reply]
Re^4: Race condition in my cron daemon by clinton (Priest) on Apr 10, 2006 at 19:02 UTC
I couldn't get this DB lock working over the fork, so I've gone a different route: Fork Use Proc::PID::File to check that this job isn't running. If it is: Increment the number of attempts to start the job in the database Select the no of attempts from the DB - complain if greater than X Exit If it isn't running: Reset the no of attempts in the DB to zero Run the job This way I avoid locking the DB at all. Perrin, thanks for your help	[reply]
Re^5: Race condition in my cron daemon by perrin (Chancellor) on Apr 17, 2006 at 15:17 UTC
Re^4: Race condition in my cron daemon by clinton (Priest) on Mar 27, 2006 at 10:04 UTC
Nope - still getting the same issue. I have added these lines to the code for the child process: `# Child # Get rid of old database connections > $lock->{InactiveDestroy} = 1; > undef $lock; > undef $db; MyStuff::DB::clear_cache(); chdir '/' or die $!;` [download] I still get the same issue where the parent locks the 'jobs' table, then when it tries to use the same dbh to write the new PID, it hangs, waiting for the lock. This only happens sporadically, and only when the child process has something to do (so that it takes fractionally longer to complete than usual), but not always... Further help would be greatly appreciated... clint	[reply] [d/l]