in reply to Signal Handling and alarm()

Even without seeing the details of your itimer, it's easy to see what's happening. A process can only have one alarm set at a time, and calling alarm clobbers any earlier timer.

What condition is it that forces you to restart? Do kids die unexpectedly? Do clients leave connections hanging? Some extra care in the kid's error handling may solve problems like that.

There are several approaches to choose from to remedy this design problem.

  1. Loop through select undef, undef, undef, $interval;. Platform-dependent.
  2. Keep the parent process around and run the alarm code there. Restart any child which doesn't answer kill 0, $cpid;.
  3. Keep the parent around and let a $SIG{CHLD} handler there restart when a child dies.
There are lots of variations you can consider.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: Signal Handling and alarm()
by gnork (Scribe) on Jun 17, 2005 at 10:29 UTC
    It took me a while to find out that LWP alarm() collides with the itimer. The reason for that behaviour was already clear.

    Until now I do the killing of non-responding childs in the external watchdog, the files used for monitoring contain the pid of the corresponding process.

    Solution #2 looks promising and I will look into that a bit further.

    Thx && Rgds,
    gnork

    Edit: The error handling can always be improved, up to now I have one or two restarts a week for 50 instances of these servers in a very busy environment. I know, no restarts would be better... .

    cat /dev/world | perl -e "(/(^.*? \?) 42\!/) && (print $1))"
    errors->(c)