sojourn548 has asked for the wisdom of the Perl Monks concerning the following question:
With help from ikegami on using Parallel ForkManager, I am able to efficiently parallelize running several child processes, where each child execs one program and the callback function inserts the exit code into a database, as shown in the following code snippet:
$pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "run_on_finish: $ident (pid: $pid) exited " . "with +code: [$exit_code]\n"; insert_into_db(\$dbh, $pid, $exit_code); + } ); for my $host (@hosts) { $pm->start($host) and next; exec(@some_command); print(STDERR "exec failed: $!\n"); _exit($!); }
Now, I am trying to extend this concept and run 2 or more external processes per host from the @hosts array and return the exit code to the parent (to insert the into a database).
Since I can only return one exit code to run_on_finish(), (in addition to not being able to exec more than 1 command from the child process, and I have to run two or more processes from the child.. which means I might have to revert back to using system()) what is the best approach to return 2 or more return codes to the parent? I am wondering if I need to use a global hash or an array to keep track of return codes and delete them after they get inserted into the database.
Another approach would be to parallelize based on the external processes that needs to be run. Another words, have @host_run_command array and iterate through that to fork child processes on that, but that could mean that I would have double or triple the number of parallelized processes if there are two or three external commands to run per host. I'd like to keep parallelization down a bit and run the command-per-host in serial.
We are talking up to 500 hosts here, or more.. Which could mean 1500 child processes if there are 3 commands to be run per host, run every few second intervals.
Is Parallel::ForkManager the right tool for this? Maybe POE?
That leads me to my second question..
What's the behavior of Parallel::ForkManager if I exceed the number of max processes I've defined? For example, if I've instantiated:
my $pm = new Parallel::ForkManager( 20 ); for my $host (@hosts) { $pm->start($host) and next; exec(@some_command); print(STDERR "exec failed: $!\n"); _exit($!); }
and there are 50 hosts in the @hosts array, then does Parallel::Forkmanager stop forking the 30 additional processes at all or will it wait until the first 20 finishes and then spawn the next 20, and then the next 10?
I have a feeling that it's the former.. and that I need to put checks in place to not exceed the limit, but I have not been able to verify this in the documentation or online anywhere.
Thanks for your feedback and patience. This forum has been invaluable, and I've learned a lot so far.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Forking child processes.
by ikegami (Patriarch) on Sep 13, 2009 at 01:17 UTC | |
by sojourn548 (Acolyte) on Sep 16, 2009 at 16:40 UTC | |
by ikegami (Patriarch) on Sep 16, 2009 at 17:26 UTC | |
by sojourn548 (Acolyte) on Nov 06, 2009 at 20:29 UTC | |
by ikegami (Patriarch) on Nov 06, 2009 at 20:59 UTC | |
by ikegami (Patriarch) on Sep 16, 2009 at 21:20 UTC | |
by sojourn548 (Acolyte) on Sep 18, 2009 at 21:43 UTC | |
by sojourn548 (Acolyte) on Sep 19, 2009 at 03:32 UTC | |
|