Re: Managing the fork/execing and reaping of child processes

Replies are listed 'Best First'.
Re^2: Managing the fork/execing and reaping of child processes by Anonymous Monk on Jul 16, 2015 at 16:30 UTC
What you should do is block all signals before forks and to use `sigsuspend` after forks (or `sigwaitinfo`).	[reply] [d/l] [select]
Re^3: Managing the fork/execing and reaping of child processes by Anonymous Monk on Jul 16, 2015 at 16:35 UTC
(which might be problematic since non-rt signals are not queued, but I don't know if you actually need that)	[reply]
Re^2: Managing the fork/execing and reaping of child processes by ibm1620 (Hermit) on Jul 16, 2015 at 17:18 UTC
Thanks for pointing that out: you're right, it is a race condition. I looked into sigsuspend and quickly became overwhelmed. A later post provided a simpler solution.	[reply]
Re^2: Managing the fork/execing and reaping of child processes by Anonymous Monk on Jul 18, 2015 at 09:04 UTC
OK, I had time to look into it futher. First, wow! I didn't even know that POSIX sigaction()... bypasses Perl safe signals - perlipc. Hmmm, makes sense but shouldn't this phrase be in POSIX? Anyway, I've removed all printing from the program and run it under strace. One failure mode is like I said: `--- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER\|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER\|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 time([1437206528]) = 1437206528 pause(^C <unfinished ...>` [download] It hangs here because there are no signals anymore. Interestingly enough, sometimes something else happens: `rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER\|SA_NODEFER, 0x7 +efbfde8a0a0}, 8) = 0 rt_sigreturn(0x7efbfd39ee40) = 136 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++` [download] That happens when two signals are delivered in rapid succession. Typically that is after call to `sigprocmask( SIG_SETMASK, empty_set, NULL )` (signals are blocked before call to `clone`, that is, `fork` and unblocked after). It seems one signal is pending and is delivered, and another one is also delivered immediately afterwards (but it can also happen without `sigprocmask`, just when it so happens that two children terminate one right after another). That causes Perl 5.22 to get SIGSEGV. Removing SA_NODEFER just always causes it to hang in the call to `sleep` after a while. So yeah, the combination of Perl's unsafe signals and half of UNIX unreliable signals doesn't work too well :-) (the other half of unreliable signals is SA_RESETHAND) Just use `Parallel::ForkManager` :-)	[reply] [d/l] [select]