gnork has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have some scripts running as daemons. They are using SOAP::Lite and subsequently LWP for HTTP transport of the SOAP requests.

These daemons are monitored. Each server process has an itimer set to 60 seconds. The signal handler {ALRM} has a small routine, which touches a file. These files are stat(ed) by another process and if now - mtime is bigger than 180 secs, the corresponding server is restarted.

LWP uses alarm() for the connection timeout. For long running operations/connections I get a connection time out in the intervals of the itimer. Somehow the ALRM signal from setititmer seems to influence LWP as well. (Proof: without itimer set, the connection timeout doesnt occur.) How can I prevent that?

TIA
gnork

cat /dev/world | perl -e "(/(^.*? \?) 42\!/) && (print $1))"
errors->(c)

Replies are listed 'Best First'.
Re: Signal Handling and alarm()
by Zaxo (Archbishop) on Jun 16, 2005 at 15:46 UTC

    Even without seeing the details of your itimer, it's easy to see what's happening. A process can only have one alarm set at a time, and calling alarm clobbers any earlier timer.

    What condition is it that forces you to restart? Do kids die unexpectedly? Do clients leave connections hanging? Some extra care in the kid's error handling may solve problems like that.

    There are several approaches to choose from to remedy this design problem.

    1. Loop through select undef, undef, undef, $interval;. Platform-dependent.
    2. Keep the parent process around and run the alarm code there. Restart any child which doesn't answer kill 0, $cpid;.
    3. Keep the parent around and let a $SIG{CHLD} handler there restart when a child dies.
    There are lots of variations you can consider.

    After Compline,
    Zaxo

      It took me a while to find out that LWP alarm() collides with the itimer. The reason for that behaviour was already clear.

      Until now I do the killing of non-responding childs in the external watchdog, the files used for monitoring contain the pid of the corresponding process.

      Solution #2 looks promising and I will look into that a bit further.

      Thx && Rgds,
      gnork

      Edit: The error handling can always be improved, up to now I have one or two restarts a week for 50 instances of these servers in a very busy environment. I know, no restarts would be better... .

      cat /dev/world | perl -e "(/(^.*? \?) 42\!/) && (print $1))"
      errors->(c)
Re: Signal Handling and alarm()
by mugwumpjism (Hermit) on Jun 16, 2005 at 21:53 UTC

    The general solution to this is to use POE, or in particular POE::Component::Client::UserAgent.

    That might sound like a formidable thing to want to do, but I've had success in the past doing this without completely re-engineering the program - just turning each turn of the "event loop" you'll have written in your program into its own little POE session.

    $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";
      Thanks for the hint, I will try that ... since the servers are running in a production environment I will first take the quick route and plug from SIG{ALRM} to SIG{USR1} and send a kill -s USR1 $pid from the outside to initiate touching the files. Then there is no collision between the two timers.

      Another solution has been posted above. I will see which works better.

      I already spotted the places to turn into POE sessions, but the transition will take a while and has to be tested before going into production.

      Rgds,
      gnork

      cat /dev/world | perl -e "(/(^.*? \?) 42\!/) && (print $1))"
      errors->(c)