in reply to Re^3: Parallel::ForkManager and wait_all_children
in thread Parallel::ForkManager and wait_all_children

During the 2-minute timeout wait to kill test4.sh, nothing else is happening (not even the run_on_start/finish for test5.sh). I still have 2 forkable processes (of the defined 3) that are not being used

That's completely false!!!

on_start for test5 runs long before test4 finishes. Check the timestamps; your output is out of order. (You probably redirected the output, which caused STDOUT to become block buffered. Use $| = 1; to disable buffering on STDOUT.)

Thu May 14 16:52:40 2015 Started ==> test4.sh Thu May 14 16:52:40 2015 Started ==> test5.sh Thu May 14 16:52:43 CDT 2015 I am running test4.sh Thu May 14 16:52:43 CDT 2015 I am running test5.sh [If you were to print out when test5 ends, it would be here] Thu May 14 16:52:53 CDT 2015 I am running test4.sh Thu May 14 16:53:03 CDT 2015 I am running test4.sh Thu May 14 16:53:13 CDT 2015 I am running test4.sh Thu May 14 16:53:23 CDT 2015 I am running test4.sh Thu May 14 16:53:33 CDT 2015 I am running test4.sh Thu May 14 16:53:43 CDT 2015 I am running test4.sh Thu May 14 16:53:53 CDT 2015 I am running test4.sh Thu May 14 16:54:03 CDT 2015 I am running test4.sh Thu May 14 16:54:13 CDT 2015 I am running test4.sh Thu May 14 16:54:23 CDT 2015 I am running test4.sh Thu May 14 16:54:33 CDT 2015 I am running test4.sh Thu May 14 16:54:43 CDT 2015 I am running test4.sh

When test5 is running, there's only one unused process, and that's because there's nothing for that process to do. Only when test5 finishes do you have two unused processes, and that's because there's nothing for those two processes to do (since you want all processes to end before pausing before running the test suite again).

You are right about on_finish, though. on_finish code is not guaranteed to run immediately when the child ends. It will run when the parent reaps the child, which will happen at some point before wait_all_children returns.

Replies are listed 'Best First'.
Re^5: Parallel::ForkManager and wait_all_children
by rgren925 (Beadle) on May 15, 2015 at 22:31 UTC
    Thanks for the $|. It definitely helps put things in perspective.

    I maintain that the other processes are, indeed, waiting behind the looping process to be killed. Here's why.

    All I did was change the TIMEOUT constant from 120 to 30.

    Here's the output at 120 seconds:

    Fri May 15 11:29:37 2015 Started ==> test1.sh Fri May 15 11:29:37 2015 Started ==> test2.sh Fri May 15 11:29:37 2015 Started ==> test3.sh Fri May 15 11:29:37 CDT 2015 I am running test1.sh Fri May 15 11:29:37 CDT 2015 I am running test2.sh Fri May 15 11:29:38 CDT 2015 I am running test3.sh Fri May 15 11:29:37 2015 Ended ==> test3.sh Fri May 15 11:29:37 2015 Started ==> test4.sh Fri May 15 11:29:39 CDT 2015 I am running test4.sh Fri May 15 11:29:37 2015 Ended ==> test2.sh Fri May 15 11:29:37 2015 Ended ==> test1.sh Fri May 15 11:29:37 2015 Started ==> test5.sh Fri May 15 11:29:40 CDT 2015 I am running test5.sh Fri May 15 11:29:37 2015 Ended ==> test5.sh Fri May 15 11:29:49 CDT 2015 I am running test4.sh Fri May 15 11:29:59 CDT 2015 I am running test4.sh Fri May 15 11:30:09 CDT 2015 I am running test4.sh Fri May 15 11:30:19 CDT 2015 I am running test4.sh Fri May 15 11:30:29 CDT 2015 I am running test4.sh Fri May 15 11:30:39 CDT 2015 I am running test4.sh Fri May 15 11:30:49 CDT 2015 I am running test4.sh Fri May 15 11:30:59 CDT 2015 I am running test4.sh Fri May 15 11:31:09 CDT 2015 I am running test4.sh Fri May 15 11:31:19 CDT 2015 I am running test4.sh Fri May 15 11:31:29 CDT 2015 I am running test4.sh Fri May 15 11:31:39 CDT 2015 I am running test4.sh Fri May 15 11:29:37 2015 Ended ==> test4.sh Fri May 15 11:31:51 2015 Started ==> test1.sh Fri May 15 11:31:51 2015 Started ==> test2.sh Fri May 15 11:31:51 2015 Started ==> test3.sh Fri May 15 11:31:52 CDT 2015 I am running test1.sh Fri May 15 11:31:52 CDT 2015 I am running test2.sh Fri May 15 11:31:52 CDT 2015 I am running test3.sh Fri May 15 11:31:51 2015 Ended ==> test3.sh Fri May 15 11:31:51 2015 Ended ==> test1.sh Fri May 15 11:31:51 2015 Ended ==> test2.sh
    Here's the output at 30 seconds:
    Fri May 15 12:58:08 2015 Started ==> test1.sh Fri May 15 12:58:08 2015 Started ==> test2.sh Fri May 15 12:58:08 2015 Started ==> test3.sh Fri May 15 12:58:08 CDT 2015 I am running test1.sh Fri May 15 12:58:09 CDT 2015 I am running test2.sh Fri May 15 12:58:09 CDT 2015 I am running test3.sh Fri May 15 12:58:08 2015 Ended ==> test2.sh Fri May 15 12:58:08 2015 Ended ==> test1.sh Fri May 15 12:58:08 2015 Ended ==> test3.sh Fri May 15 12:58:08 2015 Started ==> test4.sh Fri May 15 12:58:08 2015 Started ==> test5.sh Fri May 15 12:58:11 CDT 2015 I am running test4.sh Fri May 15 12:58:11 CDT 2015 I am running test5.sh Fri May 15 12:58:08 2015 Ended ==> test5.sh Fri May 15 12:58:21 CDT 2015 I am running test4.sh Fri May 15 12:58:31 CDT 2015 I am running test4.sh Fri May 15 12:58:41 CDT 2015 I am running test4.sh Fri May 15 12:58:08 2015 Ended ==> test4.sh Fri May 15 12:58:53 2015 Started ==> test1.sh Fri May 15 12:58:53 2015 Started ==> test2.sh Fri May 15 12:58:53 2015 Started ==> test3.sh Fri May 15 12:58:53 CDT 2015 I am running test1.sh Fri May 15 12:58:53 CDT 2015 I am running test2.sh Fri May 15 12:58:53 CDT 2015 I am running test3.sh Fri May 15 12:58:53 2015 Ended ==> test2.sh Fri May 15 12:58:53 2015 Ended ==> test1.sh Fri May 15 12:58:53 2015 Ended ==> test3.sh
    If it were true that the other two forkable processes were processing work as soon as it's available, why does it take over 2 minutes for test1-3,5 to run when the test4 timeout is 120 but only a bit over 30 secs when it's set to 30?
    This tells me that everything is waiting for the last process to finish in order to satisfy the wait_all_children. I would expect test4 to take TIMEOUT seconds to complete, but I would also expect test1-3,5 to run as soon as a forkable process is available (i.e., after the 10 second sleep in the while loop).

    What am I missing here?

      I maintain that the other processes are, indeed, waiting behind the looping process to be killed.

      Yes, just like you asked. Wait for all five processes to end/hang before sleeping ten seconds and re-running all five programs again.

      In both of the cases, there's less than 12s between the time the first batch of five finishes and the next one starts, so it's working fine.


      Why does it say test4.sh ended at 11:29:37 when it was still producing output at 11:31:39?

      Also, again, it would really help if you printed when the processes exited if you're going to keep discussing this.