Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hola Monks,

I have a few, lets say 4, "sub" processes listening on an IP socket for work.

Since there is always the chance one will crash, what is the best way to tell if they are live?

I could try and connect() to each of them at a fixed interval, but what if they are busy? Since Perl isn't multi-theaded, I assume the socket() call, from the parent, will block...no good.

In the future, I would like to have the "sub" process be on a different machine so checking if the pid is live also doesn't seem like a good solution (I haven't yet tried to find a IsPidLive($pid) fn yet...)

Also, since I am on Win32, and hoping to support Linux in the future, I am not using fork()...

Thanks for any ideas !!!

Replies are listed 'Best First'.
Re: Best way to ensure process is live
by ask (Pilgrim) on Mar 02, 2002 at 06:51 UTC
    For monitoring I would try connecting to it; check that it's work spool is not growing larger than X or something else that works for the specific application.

    For keeping a daemon alive I usually use the following for that kind of thing. If the process really wants to exit it does so with exit code 10; otherwise it'll get started again.

    #!/bin/sh # Keep a daemon running and running and running ... # # Normally starts a process in the background # unless "bg" is given as the second argument. trap '' 1 15 # ignore SIG HUP & TERM PATH=/usr/xpg4/bin:$PATH # Solaris compatibility if test `id -u` = 0 then su - foo -c "$0 $1" & exit fi if test "$1" != "-bg" then echo "$0 $$: Starting $* in background" $0 -bg "$@" & exit fi shift # remove the -bg PROGRAM=$1; shift # get and remove program name while true do echo "$0 $$: Starting $PROGRAM $@" $PROGRAM "$@" < /dev/null status=$? if test $status -eq 10 ; then msg="$PROGRAM $@ exited with status 10 - not restarted" logger -p 'local0.warning' "$msg" echo "$msg" exit 0 fi msg="$PROGRAM $@ exited with status $status - will restart (pa +rent=$$)" logger -p 'local0.warning' "$msg" echo "$msg" sleep 2 done
    (replace foo with something else if you want to make sure you always start the program as a specific user).

     - ask

    -- 
    ask bjoern hansen, http://ask.netcetera.dk/   !try; do();
    
Re: Best way to ensure process is live
by zengargoyle (Deacon) on Mar 02, 2002 at 04:58 UTC

    Check out Proc::Watchdog. Each process writes the current time to a file every so often. Another process checks the files every so often and if the time within isn't recent enough (process wedged/gone) it will alarm/restart/whatever. It might suit your needs. You can also try kill 0 $pid.

    If SIGNAL is zero, no signal is sent to the process. This is a useful way to check that the process is alive and hasn't changed its UID. See the perlport manpage for notes on the portability of this construct.

Re: Best way to ensure process is live
by dws (Chancellor) on Mar 02, 2002 at 04:51 UTC
    I have a few, lets say 4, "sub" processes listening on an IP socket for work. Since there is always the chance one will crash, what is the best way to tell if they are live?

    Assuming you spawned the subprocesses via fork() (and remembered their pids), then have the parent process wait(). When one of the subprocesses dies dies, wait() will return the dead process's pid to the parent, who can spawn a new subprocess.

Re: Best way to ensure process is live
by ajwans (Scribe) on Mar 02, 2002 at 05:05 UTC
    The requirement of running the sub processes on different machines makes this a hard(TM) problem. If it was up to me, I would have the processes listening on more than one port with a different handler for the "control" port.

    You would then need to connect this to a signal to ensure that you get a timely response from your control port and make your other processing reentrant so that interuptions (to handle control messages) do not disrupt normal processing.

      Thanks, I'll investigate this idea...one followup question: what signal do I use? I am concerned that Win32 won't support it.
        Don't know about win32. Under *nix you could use the signals SIGUSR1 or SIGUSR2. These are there specifically for user definition.