Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Parallel::ForkManager and wait_all_children

by RichardK (Parson)
on May 13, 2015 at 09:29 UTC ( [id://1126533]=note: print w/replies, xml ) Need Help??


in reply to Re: Parallel::ForkManager and wait_all_children
in thread Parallel::ForkManager and wait_all_children

How's that supposed to work? If you exec another program, the running perl is terminated so can't send the alarm. Don't you have to use system() and kill the child process if it times out?

Replies are listed 'Best First'.
Re^3: Parallel::ForkManager and wait_all_children
by afoken (Chancellor) on May 13, 2015 at 19:20 UTC
    How's that supposed to work? If you exec another program, the running perl is terminated so can't send the alarm.

    So I think that a SIGALRM is delivered to the process started via exec(). Unless the process changes its signal handler for SIGALRM, that signal will kill the process.

    Let's test that:

    #!/usr/bin/perl use strict; use warnings; sub helper { # forked process, wastes 10 seconds for (1..10) { print "helper: start of second $_\n"; select(undef,undef,undef,1); # poor man's sleep, witho +ut messing with alarm print "helper: end of second $_\n"; } } sub main { # main process print "Helper will die in 5 seconds\n"; alarm(5); # kill me in five seconds ... exec($^X,$0,"dummy argument") # start perl with this script an +d a parameter or die "Could not start helper: $!"; } if (@ARGV) { helper(); } else { main(); }

    Output:

    >perl alarmtest.pl Helper will die in 5 seconds helper: start of second 1 helper: end of second 1 helper: start of second 2 helper: end of second 2 helper: start of second 3 helper: end of second 3 helper: start of second 4 helper: end of second 4 helper: start of second 5 Alarm clock >

    Just for fun, let's add a signal handler for SIGALRM in the helper process:

    sub helper { $SIG{'ALRM'}=sub { print "I am immortal, you fool!\n" }; # forked process, wastes 10 seconds for (1..10) { print "helper: start of second $_\n"; select(undef,undef,undef,1); # poor man's sleep, witho +ut messing with alarm print "helper: end of second $_\n"; } }

    Output:

    >perl alarmtest.pl Helper will die in 5 seconds helper: start of second 1 helper: end of second 1 helper: start of second 2 helper: end of second 2 helper: start of second 3 helper: end of second 3 helper: start of second 4 helper: end of second 4 helper: start of second 5 I am immortal, you fool! helper: end of second 5 helper: start of second 6 helper: end of second 6 helper: start of second 7 helper: end of second 7 helper: start of second 8 helper: end of second 8 helper: start of second 9 helper: end of second 9 helper: start of second 10 helper: end of second 10 >

    Alexander

    Updates:

    1. changed links from [man://...] (FreeBSD) to http://linux.die.net/... (Linux)
    2. added second example with non-default signal handler
    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      So I think that a SIGALRM is delivered to the process started via exec().

      exec doesn't start a process; it executes a program in the current process. That's why alarm works.

      Unless the process changes its signal handler for SIGALRM, that signal will kill the process.

      If need be, that can be handled, as seen here.

Re^3: Parallel::ForkManager and wait_all_children
by moritz (Cardinal) on May 13, 2015 at 13:17 UTC
Re^3: Parallel::ForkManager and wait_all_children
by ikegami (Patriarch) on May 13, 2015 at 21:36 UTC

    If you exec another program, the running perl is terminated so can't send the alarm.

    alarm causes the system to send SIGALRM to the current process.

    Don't you have to use system() and kill the child process if it times out?

    You can't kill a child if you're waiting for it to exit using system, so you'd have to replace system.

    use IPC::Open3 qw( open3 ); use POSIX qw( WNOHANG ); use constant TIMEOUT => 60; sub wait_for_test_to_end { my ($pid) = @_; my $abs_timeout = time() + TIMEOUT; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(ALRM => $pid); $abs_timeout = time() + 15; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(KILL => $pid); waitpid($pid, 0); } while (1) { for my $runCommand (@runArray) { $forkMgr->start($runCommand) and next; my $pid = open3('<&STDIN', '>&STDOUT', '>&STDERR', "/usr/localcw/opt/patrol/nagios/libexec/$runCommand"); wait_for_test_to_end($pid); $forkMgr->finish($? & 0x7F ? 0x80 | ($? & 0x7F) : $? >> 8); } $forkMgr->wait_all_children; sleep 10; }

    And you're back to having a useless process between the manager than the test.

    On the plus side, you can use more complex conditions than a simple timeout. You can also forcibly kill the process if it doesn't respond to SIGALRM as the above demonstrates.

      Thanks all for the replies.

      I've tried to synthesize the different approaches.

      To flesh this out a bit, I'm planning on using callbacks (run_on_wait) to manage notifying/killing a hung process. Using the alarm(TIMEOUT) doesn't really solve my problem. If I set the timeout to 60, no other processes will run until that 60 seconds has elapsed as the wait_all_children still isn't satisfied.

      wait_for_available_procs (which was newer than my version of Parallel::ForkManager--so I upgraded) didn't seem to make any difference.

      The callbacks indicate that everything stalls until the looping test4.sh script is killed.

      use strict; use warnings; use Parallel::ForkManager; use constant TIMEOUT => 60; my @runArray = ("test1.sh", "test2.sh", "test3.sh", "test4.sh", "test5 +.sh"); my ($pid, $exitCode, $ident); my $forkMgr = Parallel::ForkManager->new(3); $forkMgr->run_on_start( sub { ($pid, $ident) = @_; print "Started ==> $ident\n"; } ); $forkMgr->run_on_finish( sub { ($pid, $exitCode, $ident) = @_; print "Ended ==> $ident\n"; } ); while (1) { for my $runCommand (@runArray) { $forkMgr->start($runCommand) and next; alarm(TIMEOUT); system("/usr/localcw/opt/patrol/nagios/libexec/$runCommand") o +r die ("exec: $!\n"); } $forkMgr->wait_all_children; sleep 10; } exit;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126533]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-03-29 05:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found