How to asynchronisly get notified of a child exit

Mostly Harmless has asked for the wisdom of the Perl Monks concerning the following question:

Greetings !
I am writing a parallel processing application. The process flow goes something like this:

f1()
    f2() in background.
    f3() in background.
f4()
wait for 'f2'
f5()
wait for 'f3'
[download]

where f1, f2 ...etc. are subroutines.
In this model, both parent and the children do the work. Some steps in parent can proceed only if some of the work has been completed by earlier spawned children.

All tasks performed in each step is critical, and if any of them fail, then the parent process must kill all children and shut itself down. For instance, if 'f3' encountered an error, like a SEGV or someone accidentally sending it a SIGKILL, then parent should know about it and exit immediately.

However, the problem is that parent does several blocking tasks and would know about a child's death, only when it calls 'wait'. The existing codebase has several CHLD handlers installed at various points. So using SIGCHLD is ruled out.

I tried various alternatives:
1. Install an ALRM handler in the parent. This handler would periodically check if the child has exited, and if so, it would die. But if any module installs an ALRM handler the whole scheme would fail. I tried using Alarm::Concurrent, but found it to be buggy. Also, the existing code has sleep() in couple of places, which may interfere with the ALRM handler.
2. Tie %SIG and capture CHLD, and override with a custom subroutine. But this scheme also needs wait() and waitpid() are overridden. This may be doable, but looks too complicated, and may affect several parts of the existing code that spawns forground processes.
3. At various parts of the parent code, call a function that would check if the child has exited. This won't work when the parent is in a blocking call.
4. Make the parent do no work, other than simple child monitoring. This needs lots of code changes for my existing application, and is not feasible.

Alas, UNIX signals suck ! :-( Anyone has been stuck with this problem before ? Any suggestions ?
thanks

Comment on How to asynchronisly get notified of a child exit Download Code

Replies are listed 'Best First'.
Re: How to asynchronisly get notified of a child exit by Fletch (Bishop) on Jul 07, 2005 at 18:16 UTC
You probably could whip up a state machine implementing this with POE and POE::Wheel::Run fairly easily. The background tasks would be run in separate processes with P::W::Run while the main process' session runs the others (waiting until the background wheels send it an event that they're complete). -- We're looking for people in ATL	[reply]
Re^2: How to asynchronisly get notified of a child exit by Mostly Harmless (Initiate) on Jul 07, 2005 at 18:23 UTC
That's interesting. I tried to read up POE couple of times, but found the learning curve to be steep :-( Can POE pre-empt a session ? I thought something like yield() has to be called from the code block for the POE kernel to look at events. If a code blocks for quite sometime, then POE kernel wouldn't be aware about other session events - am I right ? Or am I talking non-sense ? :-) Also, wouldn't I need to change my entire existing codebase to make it POE enabled ? I have a new module that spawns and controls processes, where I can use POE. I do not know how much of work it'd involve to use POE for the rest of the code. thanks !	[reply]
Re^3: How to asynchronisly get notified of a child exit by Fletch (Bishop) on Jul 07, 2005 at 18:49 UTC
If you use POE::Wheel::Run then you can run a sub in a separate forked process. That sub could run your existing `f2()` and then print something to STDOUT (say `f2 done`). The parent session in the parent process has something watching for output from that child wheel which sends itself a "f2done" event (and upon receiving that event it starts whatever was waiting for `f2` to complete). It wouldn't be seamless, but it should be doable. -- We're looking for people in ATL	[reply] [d/l] [select]
Re^4: How to asynchronisly get notified of a child exit by Mostly Harmless (Initiate) on Jul 08, 2005 at 05:44 UTC
Re: How to asynchronisly get notified of a child exit by Transient (Hermit) on Jul 07, 2005 at 17:58 UTC
It sounds as if you need a child to do the parent's blocking tasks. The parent should be a controller or do it's own work, but preferably not both. Can you not move the parent's work into a child process? Then you would be able to monitor all children and separate your concerns. What you could possibly do (and I don't know if there's any guarantees on this) is capture the parent pid in a variable before forking, then make an SIG{__DIE__} block that signals the parent PID with a HUP or something similar. Kludgey at best though =/	[reply]
Re^2: How to asynchronisly get notified of a child exit by Mostly Harmless (Initiate) on Jul 07, 2005 at 18:17 UTC
<quote> It sounds as if you need a child to do the parent's blocking tasks. </quote> Yes, that would have been perferrable. But as I mentioned earlier, it'd need lots of changes to the legacy code, which is not feasible. Also, I want to spawn a process only for very select tasks, as there's a danger of system being overlaoded with too many processes. <quote> en make an SIG{__DIE__} block that signals the parent PID with a HUP </quote> That's a good idea. But this won't work when child hits fatal errors like receiving a SIGKILL or SEGV. PS: New to perlmonks. Not sure how to quote msgs, and too lazy to read up right now ;-)	[reply]
Re: How to asynchronisly get notified of a child exit by salva (Canon) on Jul 08, 2005 at 10:41 UTC
instead of using signals or wait & co to check when children exit, you can create the child processes as `open(my $child, '\|-')` and use the `$child` file handler inside a `select()` driven event loop to detect when the child exits and handle any other blocking operation at the same time.	[reply] [d/l] [select]