Zubinix has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm writing some code to support an automated build system that runs on many computers with a large variety of operating systems from flavours of Windows (since Windows 2000) to Linux and various versions of unix (HP, Solaris, AIX etc). We use SSH as the basic transport mechanism to run scripts remotely.

My question is how to track child processes that run remotely on these machines so that if something goes wrong (or hangs indefinitely) the errant process can be identified and terminated? What is the most efficient way of doing this - I'm trying to avoid something that polls the process table using a tool like 'ps'.

Thanks in advance for the Wisdom of the Perl Monks!

Replies are listed 'Best First'.
Re: Tracking child processes
by andyford (Curate) on Sep 30, 2006 at 09:38 UTC
    You're probably gonna have to be more specific about your architecture and code and problems before anyone can help you.

    I don't know what timescales you're working on, but running an occasional ps shouldn't be that much of a burden. Have you tried?

    Check this out for a start: Proc::ProcessTable

    andyford
    or non-Perl: Andy Ford

      Thanks for the link to Proc::ProcessTable. Its hard to be specific because I have to tackle this problem generally. Don't worry about timescale it not important. The point of the post is to find out methodologies people use to track process creation and do the tidy up at the end. For example, I kick off a script which launches multiple scripts on remote machines. These scripts in turn have many child processes which sometimes die before cleaning up thus leaving orphan processes lying around which might be holding resources such as file handles. This interferes with subsequent automation runs.

      So, what methods are there in perl for real time process tracing? Such a method would allow clean up of child processes without interfering with other processes on the machine (regardless of user). Has anyone tackled a problem like this before?
Re: Tracking child processes
by eyepopslikeamosquito (Archbishop) on Sep 30, 2006 at 12:39 UTC

    I've written a similar system, but for Unix boxes only. What I did was to make the main Unix build script a process group leader and record this process group locally (i.e. on the "build driver" machine) in a "build state" file when starting the build script remotely from the build driver machine.

    With that done, to kill the build on any Unix box from the build driver machine, I simply look up the process group of the build script from the local "build state" file and issue a remote command to kill that process group (i.e. kill -process-group-id), which will kill all build processes started from the main build script. As a precaution, after looking up the process group id, I do a remote "ps" command to list the processes belonging to that process group and prompt for confirmation before killing.

      This is similar to what I want. I think MS Windows has a similar concept to unix's group leader. Was your system written in perl?

        Yes, it was a work system written in Perl. I don't have the code available to me right now though because I'm at home.

        Early versions of Windows did not have a similar concept to Unix's process groups ... and so applications such as Visual Studio rolled their own complex custom schemes to achieve a similar effect. This was remedied with the introduction of Jobs in Windows 2000. The Win32::Job module (comes with ActivePerl, needs Windows 2000 and above) should be a suitable replacement for Unix process groups.

Re: Tracking child processes
by zentara (Cardinal) on Sep 30, 2006 at 12:01 UTC
    My question is how to track child processes that run remotely on these machines so that if something goes wrong (or hangs indefinitely) the errant process can be identified and terminated?

    Thats a pretty broad question, but the first model that pops into my head, is one where for every remote process you launch, you launch a companion watcher script, which watches the process and reports back via sockets. Then at your control machine, you collect the progress reports, and analyze them.

    You will need an event loop system for your collection script( like POE, Tk, Gtk2, Glib ), so you can simultaneously collect socket data and analyze them.

    The companion watcher script would sleep most of the time, waking up periodically checking the running time of it's assigned pid, log entries, etc, and reporting them in over the socket.


    I'm not really a human, but I play one on earth. Cogito ergo sum a bum