Re^2: Race condition in my cron daemon

Perrin - thanks - I'll give the InactiveDestroy a go. I didn't know about that.

But I have a feeling that that isn't the problem, because the same connection ID is used to obtain the lock and to update the jobs table (both of which happen in the parent process) - and this I can see in the database log. The REPLACE statement finally runs when I shut down the server, so something somewhere is hanging onto that lock.

As far as reusing the connection in the child, I specifically clear out the DBI connection cache and request a new connection in the child. Again, in the logs, I can see that the parent and child are using different, new, connections.

Comment on Re^2: Race condition in my cron daemon

Replies are listed 'Best First'.
Re^3: Race condition in my cron daemon by tilly (Archbishop) on Mar 21, 2006 at 06:22 UTC
I strongly suggest following perrin's advice, and not discounting it unless it doesn't work. The likely problem that perrin noted is that the database server winds up being talked to by both parent and child at the same time. And the database gets confused, resulting in unpredictable behaviour. It isn't visible to you here because the race is from an implicit action that you don't see in your code. When you call MyStuff::DB::clear_cache() you remove the database handle in the child. That calls the handle's DESTROY method, which is likely to do cleanup, including telling the database, "I'm all done here." If at the same time the parent is trying to tell the database "Please do this work" the database can get all confused in a million ways. For instance the two messages might confuse the database into thinking that it hasn't yet received the full message to act on so it is waiting for the rest of the message, while the parent process is waiting for a response - leading to a hang. Use the InactiveDestroy parameter and the problem goes away because the child is no longer talking to the database behind the parent's back.	[reply]

Replies are listed 'Best First'.

Re^3: Race condition in my cron daemon
by tilly (Archbishop) on Mar 21, 2006 at 06:22 UTC

perrin

The likely problem that perrin noted is that the database server winds up being talked to by both parent and child at the same time. And the database gets confused, resulting in unpredictable behaviour.

It isn't visible to you here because the race is from an implicit action that you don't see in your code. When you call MyStuff::DB::clear_cache() you remove the database handle in the child. That calls the handle's DESTROY method, which is likely to do cleanup, including telling the database, "I'm all done here." If at the same time the parent is trying to tell the database "Please do this work" the database can get all confused in a million ways. For instance the two messages might confuse the database into thinking that it hasn't yet received the full message to act on so it is waiting for the rest of the message, while the parent process is waiting for a response - leading to a hang.

Use the InactiveDestroy parameter and the problem goes away because the child is no longer talking to the database behind the parent's back.

[reply]