I am now using your code posted above (using open3 instead of exec/system, etc.):
use strict; use warnings; use Parallel::ForkManager; use IPC::Open3 qw( open3 ); use POSIX qw( WNOHANG ); use constant TIMEOUT => 120; my @runArray = ("test1.sh", "test2.sh", "test3.sh", "test4.sh", "test5 +.sh"); my ($pid, $exitCode, $ident); my $currentTime; my $forkMgr = Parallel::ForkManager->new(3); $forkMgr->run_on_start( sub { ($pid, $ident) = @_; print "$currentTime Started ==> $ident\n"; } ); $forkMgr->run_on_finish( sub { ($pid, $exitCode, $ident) = @_; print "$currentTime Ended ==> $ident\n"; } ); while (1) { $currentTime = localtime(); for my $runCommand (@runArray) { $forkMgr->start($runCommand) and next; my $pid = open3('<&STDIN', '>&STDOUT', '>&STDERR', "/usr/localcw/opt/patrol/nagios/libexec/$runCo +mmand"); wait_for_test_to_end($pid); $forkMgr->finish($? & 0x7F ? 0x80 | ($? & 0x7F) : $? >> 8); } $forkMgr->wait_all_children; sleep 10; } exit; sub wait_for_test_to_end { my ($pid) = @_; my $abs_timeout = time() + TIMEOUT; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(ALRM => $pid); $abs_timeout = time() + 15; while (1) { return if waitpid($pid, WNOHANG) > 0; last if time() > $abs_timeout; sleep(1); } kill(KILL => $pid); waitpid($pid, 0); }
Still same behavior.
The looping test4.sh is still hanging everything up.
Here's some trace output. The "Started/Ended" statements are coming from the callbacks and the "I am running..." are coming from the test1-5.sh scripts.
Thu May 14 16:52:40 2015 Started ==> test1.sh Thu May 14 16:52:40 2015 Started ==> test2.sh Thu May 14 16:52:41 CDT 2015 I am running test1.sh Thu May 14 16:52:41 CDT 2015 I am running test2.sh Thu May 14 16:52:41 CDT 2015 I am running test3.sh Thu May 14 16:52:40 2015 Started ==> test3.sh Thu May 14 16:52:40 2015 Ended ==> test3.sh Thu May 14 16:52:40 2015 Ended ==> test2.sh Thu May 14 16:52:40 2015 Ended ==> test1.sh Thu May 14 16:52:40 2015 Started ==> test4.sh Thu May 14 16:52:43 CDT 2015 I am running test4.sh Thu May 14 16:52:43 CDT 2015 I am running test5.sh Thu May 14 16:52:53 CDT 2015 I am running test4.sh Thu May 14 16:53:03 CDT 2015 I am running test4.sh Thu May 14 16:53:13 CDT 2015 I am running test4.sh Thu May 14 16:53:23 CDT 2015 I am running test4.sh Thu May 14 16:53:33 CDT 2015 I am running test4.sh Thu May 14 16:53:43 CDT 2015 I am running test4.sh Thu May 14 16:53:53 CDT 2015 I am running test4.sh Thu May 14 16:54:03 CDT 2015 I am running test4.sh Thu May 14 16:54:13 CDT 2015 I am running test4.sh Thu May 14 16:54:23 CDT 2015 I am running test4.sh Thu May 14 16:54:33 CDT 2015 I am running test4.sh Thu May 14 16:54:43 CDT 2015 I am running test4.sh Thu May 14 16:52:40 2015 Started ==> test5.sh Thu May 14 16:52:40 2015 Ended ==> test5.sh Thu May 14 16:52:40 2015 Ended ==> test4.sh Thu May 14 16:54:56 2015 Started ==> test1.sh Thu May 14 16:54:56 2015 Started ==> test2.sh Thu May 14 16:54:56 CDT 2015 I am running test1.sh Thu May 14 16:54:56 CDT 2015 I am running test2.sh Thu May 14 16:54:56 CDT 2015 I am running test3.sh Thu May 14 16:54:56 2015 Started ==> test3.sh Thu May 14 16:54:56 2015 Ended ==> test2.sh Thu May 14 16:54:56 2015 Ended ==> test1.sh Thu May 14 16:54:56 2015 Ended ==> test3.sh Thu May 14 16:54:56 2015 Started ==> test4.sh Thu May 14 16:54:58 CDT 2015 I am running test4.sh Thu May 14 16:54:58 CDT 2015 I am running test5.sh Thu May 14 16:55:08 CDT 2015 I am running test4.sh Thu May 14 16:55:18 CDT 2015 I am running test4.sh etc.
During the 2-minute timeout wait to kill test4.sh, nothing else is happening (not even the run_on_start/finish for test5.sh). I still have 2 forkable processes (of the defined 3) that are not being used, I believe, because forkmanager is waiting for all the children to be done. I recognize that one process will be tied up for the timeout value, but I need the other two to continue processing available work (test1-3.sh and test5.sh). I'll take care to ensure test4 doesn't run again while there is one already running (using a hash of running jobs managed by the callbacks.

That is the crux of my problem.


In reply to Re^3: Parallel::ForkManager and wait_all_children by rgren925
in thread Parallel::ForkManager and wait_all_children by rgren925

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.