in reply to Race condition in my cron daemon

Handling database connections in a forking server is very difficult to get right. In this case I recommend that you set the InactiveDestroy property on the connection after forking, and then open a new connection to use for any further database interaction. Do not try to continue using a handle opened in the parent process from the child process.

Replies are listed 'Best First'.
Re^2: Race condition in my cron daemon
by clinton (Priest) on Mar 20, 2006 at 18:28 UTC
    Perrin - thanks - I'll give the InactiveDestroy a go. I didn't know about that.

    But I have a feeling that that isn't the problem, because the same connection ID is used to obtain the lock and to update the jobs table (both of which happen in the parent process) - and this I can see in the database log. The REPLACE statement finally runs when I shut down the server, so something somewhere is hanging onto that lock.

    As far as reusing the connection in the child, I specifically clear out the DBI connection cache and request a new connection in the child. Again, in the logs, I can see that the parent and child are using different, new, connections.

      I strongly suggest following perrin's advice, and not discounting it unless it doesn't work.

      The likely problem that perrin noted is that the database server winds up being talked to by both parent and child at the same time. And the database gets confused, resulting in unpredictable behaviour.

      It isn't visible to you here because the race is from an implicit action that you don't see in your code. When you call MyStuff::DB::clear_cache() you remove the database handle in the child. That calls the handle's DESTROY method, which is likely to do cleanup, including telling the database, "I'm all done here." If at the same time the parent is trying to tell the database "Please do this work" the database can get all confused in a million ways. For instance the two messages might confuse the database into thinking that it hasn't yet received the full message to act on so it is waiting for the rest of the message, while the parent process is waiting for a response - leading to a hang.

      Use the InactiveDestroy parameter and the problem goes away because the child is no longer talking to the database behind the parent's back.

Re^2: Race condition in my cron daemon
by clinton (Priest) on Mar 22, 2006 at 12:33 UTC
    I have added a $dbh->{InactiveDestroy} = 1 in the child process, and so far so good. It is sporadic, so I won't know that it is fixed until it has run for a while longer, but looking good so far.

    Many thanks Perrin

      Based on your description, it may be something else. If that doesn't fix it, let us know.
        I couldn't get this DB lock working over the fork, so I've gone a different route:

        • Fork
        • Use Proc::PID::File to check that this job isn't running.
        • If it is:
          • Increment the number of attempts to start the job in the database
          • Select the no of attempts from the DB - complain if greater than X
          • Exit
        • If it isn't running:
          • Reset the no of attempts in the DB to zero
          • Run the job
        This way I avoid locking the DB at all.

        Perrin, thanks for your help

        Nope - still getting the same issue. I have added these lines to the code for the child process:

        # Child # Get rid of old database connections > $lock->{InactiveDestroy} = 1; > undef $lock; > undef $db; MyStuff::DB::clear_cache(); chdir '/' or die $!;

        I still get the same issue where the parent locks the 'jobs' table, then when it tries to use the same dbh to write the new PID, it hangs, waiting for the lock.

        This only happens sporadically, and only when the child process has something to do (so that it takes fractionally longer to complete than usual), but not always...

        Further help would be greatly appreciated...

        clint