in reply to Managing the fork/execing and reaping of child processes

Well, but of course there is an inherent race condition in your program: what if you recieve all SIGCHLDs before sleep? that is
... # PARENT process say "Spawned child $active_readers pid $service_pid" } } ## All SIGCHLDs happen here sleep; ## No signals here, sleep forever
I'm pretty sure that's what happens.

Replies are listed 'Best First'.
Re^2: Managing the fork/execing and reaping of child processes
by Anonymous Monk on Jul 16, 2015 at 16:30 UTC
    What you should do is block all signals before forks and to use sigsuspend after forks (or sigwaitinfo).
      (which might be problematic since non-rt signals are not queued, but I don't know if you actually need that)
Re^2: Managing the fork/execing and reaping of child processes
by ibm1620 (Hermit) on Jul 16, 2015 at 17:18 UTC
    Thanks for pointing that out: you're right, it is a race condition. I looked into sigsuspend and quickly became overwhelmed. A later post provided a simpler solution.
Re^2: Managing the fork/execing and reaping of child processes
by Anonymous Monk on Jul 18, 2015 at 09:04 UTC
    OK, I had time to look into it futher.

    First, wow! I didn't even know that POSIX sigaction()... bypasses Perl safe signals - perlipc. Hmmm, makes sense but shouldn't this phrase be in POSIX? Anyway, I've removed all printing from the program and run it under strace. One failure mode is like I said:

    --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 time([1437206528]) = 1437206528 pause(^C <unfinished ...>
    It hangs here because there are no signals anymore.

    Interestingly enough, sometimes something else happens:

    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +efbfde8a0a0}, 8) = 0 rt_sigreturn(0x7efbfd39ee40) = 136 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++
    That happens when two signals are delivered in rapid succession. Typically that is after call to sigprocmask( SIG_SETMASK, empty_set, NULL ) (signals are blocked before call to clone, that is, fork and unblocked after). It seems one signal is pending and is delivered, and another one is also delivered immediately afterwards (but it can also happen without sigprocmask, just when it so happens that two children terminate one right after another). That causes Perl 5.22 to get SIGSEGV. Removing SA_NODEFER just always causes it to hang in the call to sleep after a while.

    So yeah, the combination of Perl's unsafe signals and half of UNIX unreliable signals doesn't work too well :-) (the other half of unreliable signals is SA_RESETHAND)

    Just use Parallel::ForkManager :-)