Mostly Harmless has asked for the wisdom of the Perl Monks concerning the following question:

Greetings !
I am writing a parallel processing application. The process flow goes something like this:
f1() f2() in background. f3() in background. f4() wait for 'f2' f5() wait for 'f3'

where f1, f2 ...etc. are subroutines.
In this model, both parent and the children do the work. Some steps in parent can proceed only if some of the work has been completed by earlier spawned children.

All tasks performed in each step is critical, and if any of them fail, then the parent process must kill all children and shut itself down. For instance, if 'f3' encountered an error, like a SEGV or someone accidentally sending it a SIGKILL, then parent should know about it and exit immediately.

However, the problem is that parent does several blocking tasks and would know about a child's death, only when it calls 'wait'. The existing codebase has several CHLD handlers installed at various points. So using SIGCHLD is ruled out.

I tried various alternatives:
1. Install an ALRM handler in the parent. This handler would periodically check if the child has exited, and if so, it would die. But if any module installs an ALRM handler the whole scheme would fail. I tried using Alarm::Concurrent, but found it to be buggy. Also, the existing code has sleep() in couple of places, which may interfere with the ALRM handler.
2. Tie %SIG and capture CHLD, and override with a custom subroutine. But this scheme also needs wait() and waitpid() are overridden. This may be doable, but looks too complicated, and may affect several parts of the existing code that spawns forground processes.
3. At various parts of the parent code, call a function that would check if the child has exited. This won't work when the parent is in a blocking call.
4. Make the parent do no work, other than simple child monitoring. This needs lots of code changes for my existing application, and is not feasible.

Alas, UNIX signals suck ! :-( Anyone has been stuck with this problem before ? Any suggestions ?
thanks

Replies are listed 'Best First'.
Re: How to asynchronisly get notified of a child exit
by Fletch (Bishop) on Jul 07, 2005 at 18:16 UTC

    You probably could whip up a state machine implementing this with POE and POE::Wheel::Run fairly easily. The background tasks would be run in separate processes with P::W::Run while the main process' session runs the others (waiting until the background wheels send it an event that they're complete).

    --
    We're looking for people in ATL

      That's interesting. I tried to read up POE couple of times, but found the learning curve to be steep :-(

      Can POE pre-empt a session ? I thought something like yield() has to be called from the code block for the POE kernel to look at events. If a code blocks for quite sometime, then POE kernel wouldn't be aware about other session events - am I right ? Or am I talking non-sense ? :-)
      Also, wouldn't I need to change my entire existing codebase to make it POE enabled ? I have a new module that spawns and controls processes, where I can use POE. I do not know how much of work it'd involve to use POE for the rest of the code.

      thanks !

        If you use POE::Wheel::Run then you can run a sub in a separate forked process. That sub could run your existing f2() and then print something to STDOUT (say f2 done). The parent session in the parent process has something watching for output from that child wheel which sends itself a "f2done" event (and upon receiving that event it starts whatever was waiting for f2 to complete). It wouldn't be seamless, but it should be doable.

        --
        We're looking for people in ATL

Re: How to asynchronisly get notified of a child exit
by Transient (Hermit) on Jul 07, 2005 at 17:58 UTC
    It sounds as if you need a child to do the parent's blocking tasks. The parent should be a controller or do it's own work, but preferably not both. Can you not move the parent's work into a child process? Then you would be able to monitor all children and separate your concerns.

    What you could possibly do (and I don't know if there's any guarantees on this) is capture the parent pid in a variable before forking, then make an SIG{__DIE__} block that signals the parent PID with a HUP or something similar. Kludgey at best though =/
      <quote> It sounds as if you need a child to do the parent's blocking tasks. </quote>

      Yes, that would have been perferrable. But as I mentioned earlier, it'd need lots of changes to the legacy code, which is not feasible. Also, I want to spawn a process only for very select tasks, as there's a danger of system being overlaoded with too many processes.

      <quote> en make an SIG{__DIE__} block that signals the parent PID with a HUP </quote>

      That's a good idea. But this won't work when child hits fatal errors like receiving a SIGKILL or SEGV.

      PS: New to perlmonks. Not sure how to quote msgs, and too lazy to read up right now ;-)

Re: How to asynchronisly get notified of a child exit
by salva (Canon) on Jul 08, 2005 at 10:41 UTC
    instead of using signals or wait & co to check when children exit, you can create the child processes as open(my $child, '|-') and use the $child file handler inside a select() driven event loop to detect when the child exits and handle any other blocking operation at the same time.