Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Managing the fork/execing and reaping of child processes

by ibm1620 (Hermit)
on Jul 16, 2015 at 15:29 UTC ( [id://1135034]=perlquestion: print w/replies, xml ) Need Help??

ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a driver program in Perl to keep 10 instances of a short-running program concurrently active. (This is by way of trying to recreate and debug a concurrency bug with the short-running program, which is written in C.) The problem is that my driver program itself exhibits a concurrency bug.

This is Perl V5.16.2, running on Linux (CentOS).

Here's a stripped down version of the driver program:

#!/bin/env perl use 5.010; use warnings; use strict; use Carp; use File::Basename; use IO::Socket; use IO::Select; use IO::File; use POSIX qw{WNOHANG setsid}; use POSIX qw{SIGHUP SIGTERM SIGUSR1}; my $sigset = POSIX::SigSet->new(); # SIGCHLD (17) child-process ended # my $CHLDaction = POSIX::SigAction->new( 'sigCHLD_handler', $sigset, &POSIX::SA_NODE +FER ); POSIX::sigaction( &POSIX::SIGCHLD, $CHLDaction ); sub sigCHLD_handler { say "Enter sigCHLD handler"; } say "Perl version: $^V"; my $active_readers = 0; my $current_reader_limit = 10; while (1) { if ($active_readers < $current_reader_limit) { # Launch enough new readers to bring us up to the current limi +t while ($active_readers < $current_reader_limit) { ++$active_readers; my $service_pid; if ( !defined( $service_pid = fork ) ) { say "Couldn't fork. Exit."; exit; } # CHILD process # elsif ( 0 == $service_pid ) { my $command = "echo Hello from child $active_readers " + . 'pid $$'; exec $command; } # PARENT process say "Spawned child $active_readers pid $service_pid" } } sleep; say "Out of sleep, begin reaping"; while ( ( my $kid = waitpid( -1, WNOHANG ) ) > 0 ) { --$active_readers; say "Reaping child process $kid, active readers now $active_re +aders"; } say "Finished reaping"; }
And here's the output from one run, before it freezes:
$ tom_strip Perl version: v5.16.2 Spawned child 1 pid 18734 Spawned child 2 pid 18735 Spawned child 3 pid 18736 Spawned child 4 pid 18737 Spawned child 5 pid 18738 Spawned child 6 pid 18739 Spawned child 7 pid 18740 Spawned child 8 pid 18741 Spawned child 9 pid 18742 Spawned child 10 pid 18743 Hello from child 1 pid 18734 Enter sigCHLD handler Out of sleep, begin reaping Hello from child 2 pid 18735 Reaping child process 18734, active readers now 9 Finished reaping Hello from child 3 pid 18736 Enter sigCHLD handler Hello from child 4 pid 18737 Enter sigCHLD handler Hello from child 5 pid 18738 Enter sigCHLD handler Hello from child 6 pid 18739 Enter sigCHLD handler Enter sigCHLD handler Hello from child 7 pid 18740 Enter sigCHLD handler Hello from child 9 pid 18742 Hello from child 8 pid 18741 Hello from child 10 pid 18743 Hello from child 10 pid 18748
I do nothing in the signal handler that would cause a data race; it's there only to cause sleep to wake up.

In every case where I've looked, there are no children processes remaining, and only the driver program remains.

If someone could explain what's going wrong, and what the right way to accomplish this is, I'd appreciate it.

Replies are listed 'Best First'.
Re: Managing the fork/execing and reaping of child processes
by afoken (Chancellor) on Jul 16, 2015 at 17:29 UTC
    while (1) { if ($active_readers < $current_reader_limit) { # Launch enough new readers to bring us up to the current limi +t

    Have you considered using Parallel::ForkManager?

    DESCRIPTION

    This module is intended for use in operations that can be done in parallel where the number of processes to be forked off should be limited.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      By far the best suggestion . . . this is a thing already done.
Re: Managing the fork/execing and reaping of child processes
by QM (Parson) on Jul 16, 2015 at 15:46 UTC
    Perhaps not the issue, but ++$active_readers should be later in the parent, after checking if the fork was successful.

    Why the exec in line 47? You could just as easily print from perl.

    What system are you running on? I'm guessing there's a fork limit or something involved, and the perl script isn't getting any CPU or IO cycles, and is hanging there. Have you checked the CPU usage? Have you considered tracing the execution, and finding the last line executed before the hang?

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      In the real program, I exec a C program. I just wanted to keep differences to a minimum in this stripped-down version.

      I will look into the $active_readers question.

        System is CentOS. I don't know what kind of PID limits there are. When the freeze occurs, there's no CPU being used on the box (I'm the only occupant currently). There are no zombies. What are you referring to by "tracing"?
      Moving ++$active_readers didn't help (not surprisingly; if the fork were to fail, my program would exit).
Re: Managing the fork/execing and reaping of child processes
by Anonymous Monk on Jul 16, 2015 at 16:56 UTC

    Simplify and add lightness (oh, wait, adding lightness is for aircraft design :)

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1135034 use strict; $| = 1; my $active_readers = 0; my $current_reader_limit = 10; my $total = 0; while(1) { if($active_readers < $current_reader_limit) { $active_readers++; ++$total; if(my $pid = fork) # parent { print "Spawned child $active_readers pid $pid total $total\n"; } elsif(defined $pid) # child { exec "echo Hello from child $active_readers pid \$\$" or die "exec failed with $!"; } else # fork failed { die "fork failed with $!"; } } elsif((my $pid = wait()) > 0) { $active_readers--; print "Reaped $pid, active = $active_readers\n" } }
      Rendered readable:
      #!/usr/bin/perl # http://perlmonks.org/?node_id=1135034 use strict; $| = 1; my $active_readers = 0; my $current_reader_limit = 10; my $total = 0; while(1) { if($active_readers < $current_reader_limit) { $active_readers++; ++$total; if(my $pid = fork) { # parent print "Spawned child $active_readers pid $pid total $tota +l\n"; } elsif(defined $pid) { # child exec "echo Hello from child $active_readers pid \$\$" or die "exec failed with $!"; } else { # fork failed die "fork failed with $!"; } } elsif((my $pid = wait()) > 0) { $active_readers--; print "Reaped $pid, active = $active_readers\n" } }
        Sigh...

      If the fork fails, you're still incrementing both $active_readers and $total. Those should probably be tracked in the parent.

        If the fork fails, the parent dies. Is that tracking enough? I figure that if the fork fails it's going to muck things up enough that the die is proper.

      It works. That's beautifully simple. Now to debug what I actually WANTED to debug. I can't thank you enough.
Re: Managing the fork/execing and reaping of child processes
by Anonymous Monk on Jul 16, 2015 at 16:28 UTC
    Well, but of course there is an inherent race condition in your program: what if you recieve all SIGCHLDs before sleep? that is
    ... # PARENT process say "Spawned child $active_readers pid $service_pid" } } ## All SIGCHLDs happen here sleep; ## No signals here, sleep forever
    I'm pretty sure that's what happens.
      What you should do is block all signals before forks and to use sigsuspend after forks (or sigwaitinfo).
        (which might be problematic since non-rt signals are not queued, but I don't know if you actually need that)
      Thanks for pointing that out: you're right, it is a race condition. I looked into sigsuspend and quickly became overwhelmed. A later post provided a simpler solution.
      OK, I had time to look into it futher.

      First, wow! I didn't even know that POSIX sigaction()... bypasses Perl safe signals - perlipc. Hmmm, makes sense but shouldn't this phrase be in POSIX? Anyway, I've removed all printing from the program and run it under strace. One failure mode is like I said:

      --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +fdd9a6810a0}, 8) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 rt_sigreturn(0x7fdd99b95e40) = 0 time([1437206528]) = 1437206528 pause(^C <unfinished ...>
      It hangs here because there are no signals anymore.

      Interestingly enough, sometimes something else happens:

      rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, NULL, {0x49ff50, [], SA_RESTORER|SA_NODEFER, 0x7 +efbfde8a0a0}, 8) = 0 rt_sigreturn(0x7efbfd39ee40) = 136 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++
      That happens when two signals are delivered in rapid succession. Typically that is after call to sigprocmask( SIG_SETMASK, empty_set, NULL ) (signals are blocked before call to clone, that is, fork and unblocked after). It seems one signal is pending and is delivered, and another one is also delivered immediately afterwards (but it can also happen without sigprocmask, just when it so happens that two children terminate one right after another). That causes Perl 5.22 to get SIGSEGV. Removing SA_NODEFER just always causes it to hang in the call to sleep after a while.

      So yeah, the combination of Perl's unsafe signals and half of UNIX unreliable signals doesn't work too well :-) (the other half of unreliable signals is SA_RESETHAND)

      Just use Parallel::ForkManager :-)
Re: Managing the fork/execing and reaping of child processes
by flexvault (Monsignor) on Jul 16, 2015 at 18:38 UTC

    ibm1620,

    I have lots of code for using signals to control forked children. Since about 3-4 years ago, I have stopped using signals and that code! Trying to have the same Perl script be able to run on different *nixes just doesn't work like it used to.

    What I have found to work very well for me is to use 'kill 0' in the parent code to tell if the child still exists I use 'usleep' to wait for a few milliseconds between tests in a 'while (1)' loop. I keep the child pids in a hash with the values being the initial time of the fork. I usually test for the memory usage of each child and if it starts to grow too quickly, I try to halt it gently and if it doesn't exit than I force kill the process. (Note: That hasn't happened yet, but it may.)

    I'm not saying you can't get it to work on one distribution of *nix, but it seems that the different vendors are *impoving* the signal process all the time :-). YMMY.

    Update: Corrected 'fork 0' to 'kill 0' as AM noticed.

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin

      You meant "kill 0" instead of "fork 0", didn't you?

        Dear Anonymous Monks,

        You are correct!

        Regards...Ed

        "Well done is better than well said." - Benjamin Franklin

Re: Managing the fork/execing and reaping of child processes
by RichardK (Parson) on Jul 16, 2015 at 16:05 UTC

    Are you sure that your exec call worked? It can be tricky, see the docs exec.

    You could check if it fails :-

    exec ('foo') or die "couldn't exec foo: $!";
      I added the ".. or die 'couldnt..'" but it doesn't get invoked. It seems to me that, in this case, exec (unlike fork) would either always work, or never work. The presence of "Hellos" in the output indicates to me that it's working.
Re: Managing the fork/execing and reaping of child processes
by BrowserUk (Patriarch) on Jul 16, 2015 at 15:36 UTC
    before it freezes:

    Hm. I see no way out of your while loop (except the exit if fork fails), so how are you expecting the program to end?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      My understanding of the OP is that it shouldn't exit.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Correct. I don't want it to end.
Re: Managing the fork/execing and reaping of child processes
by ibm1620 (Hermit) on Jul 16, 2015 at 16:30 UTC
    UPDATE: I'd assumed that I was frozen in sleep, but I put a print in just before the sleep, and it doesn't get executed. Running under perl -d, I'm unable to ^C when the freeze occurs.
      Unbuffered print?
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Good thought - I added $|=1 but still no message output before sleep.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1135034]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-18 19:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found