Tronen has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Ok, this is the current situation:

I have a set of daemons that are running. And I have one perl script that is controlling these daemons. I didn't write this controlling script, but I am questioning it's architecture.

Each daemon is executed like:

./name_daemontool.pl -id=UNIQUEID

The process that is controlling it is managing it by a simple command like:

`ps -ef | grep '_daemontool.pl'`

And then using that and finding UNIQUEID and therefore be able to define if a daemon is running or not, and with that info take proper action.

HOWEVER: I think this is a weak approach. As in for example a RedHat environment where the COLUMNS env is set to something smaller, all info wouldn't be presented. That mean that daemons sometimes are believed to be down, and restarted, and this is done often, so after a while 1000+ daemons are running, and causing abnormal usage of system resources.

---

My hope of how this could be handled:

A daemon controlling process daemon_controller.pl would be daemonized and launch the name_daemontool.pl as a child process of it self. If the name_daemontool.pl dies, daemon_controller would get notified, and it can take proper actions from that...

---

My question: What would the best approach be if we need to control this set of *_daemontool.pl only by being able to: track if they died, start them, stop them.

This might be too broad question, but hopefully someone with similar experience might be able to share how they solved this need.

Best regards,
Peter Lauri

Replies are listed 'Best First'.
Re: Control other processes
by almut (Canon) on Feb 29, 2008 at 16:34 UTC

    A common approach would be to store the PIDs of the started daemons (in a file, DB, whatever...), and then use kill with signal zero to check whether those PIDs are still alive. If not, restart them and remove the old PID from the list...  In case the process is alive, you may also want to check the process name, to make sure it's not some other program which has 'recycled' the same PID in the meantime.

Re: Control other processes
by pc88mxer (Vicar) on Feb 29, 2008 at 17:13 UTC
    HOWEVER: I think this is a weak approach. As in for example a RedHat e +nvironment where the COLUMNS env is set to something smaller, all inf +o wouldn't be presented. ...

    You can always use ps --cols NNN -elf | grep ... or set the COLUMNS environment variable before invoking ps.

    Another option is to have each daemon open a Unix domain socket whose path is based on their id, e.g. daemon instance N opens /var/control/name_N. This can serve multiple purposes: 1) you can check to see if the daemon is running by trying to connect to the socket. If the daemon isn't accepting on the socket then you can asume it isn't running. 2) you can talk to the daemon via the socket and issue control messages or get a status, etc.

    Update: Also with the socket approach, you can get the pid of the daemon that has the socket open by using lsof. This lets you hand off keeping track of this detail to the OS.

Re: Control other processes
by TOD (Friar) on Mar 01, 2008 at 07:53 UTC
    although i consider pc88mxer's suggestion the most elegant, maybe even a small shell script can serve the purpose:
    #!/bin/bash for $PROC in daemon1 daemon2 daemon3 ...; do PID=$(ps aux | grep $PROC | awk '{print $2}') if [ $PID = '' ]; then $PROC &&; fi done
    --------------------------------
    masses are the opiate for religion.