in reply to $SIG{CHILD} misses on zombies

What does your signal handler look like? My guess is that you're only waiting for a single child when two or three could have all exited at about the same time.

sub child_handler { 1 while -1 != waitpid -1, WNOHANG; }

This way, if there are multiple children who have recently died, you'll get them all. Of course, if all you're trying to do is ignore the children's return codes and prevent zombies, just set $SIG{CHLD} to 'IGNORE', and perl will just Do What I(you) Mean.

Replies are listed 'Best First'.
Re^2: $SIG{CHILD} misses on zombies
by gmantz (Initiate) on Sep 06, 2005 at 06:22 UTC
    Hi and thanks for the reply.
    I have written the same code as your example, with -1 as the pid parameter to waitpid. So i think i am waiting for every child to die.
    The problem is that when a child dies, the child_handler routine runs, but if another child dies at the same time that the routine is handling the previous child, the last death is ignored (i.e the program counter is on the handler routine already). I dont want to ignore children return codes and simply prevent zombies. I want to know if everything i spawned is running/has exited

      George,

      Are you doing the waitpid in a loop like I did? That should clear up all zombies that are waiting at that point.

      sub child_handler { my $i; ++$i while -1 != waitpid -1, WNOHANG; print "Reaped $i children\n"; }
      This may help you see how many children were waiting at a given time. You can also throw in more debugging such as log the output from ps or something to see the states of all processes on the system, then you can manually identify what is actually running, zombied, etc., both before and after the waitpid loop.

        Hello again,
        I wrote it exactly as you did. In fact i copied and pasted this code just to be sure. I still had dead children that the handler was unable to handle because they died exactly when the handler code was occupied with another child.
        Since i dont care about the pids of the dead processes i tried using wait() and i got it to work. The code looks like:
        while (($openforks > 0) && (wait())) { $openforks--; spawn(\@jobs); print "somebody died\n"; }

        In all the tests i ran, i reap the exact number of jobs that i spawned.
        Thanks for all the help though :)

        George