sojourn548 has asked for the wisdom of the Perl Monks concerning the following question:

With help from ikegami on using Parallel ForkManager, I am able to efficiently parallelize running several child processes, where each child execs one program and the callback function inserts the exit code into a database, as shown in the following code snippet:

$pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "run_on_finish: $ident (pid: $pid) exited " . "with +code: [$exit_code]\n"; insert_into_db(\$dbh, $pid, $exit_code); + } ); for my $host (@hosts) { $pm->start($host) and next; exec(@some_command); print(STDERR "exec failed: $!\n"); _exit($!); }

Now, I am trying to extend this concept and run 2 or more external processes per host from the @hosts array and return the exit code to the parent (to insert the into a database).

Since I can only return one exit code to run_on_finish(), (in addition to not being able to exec more than 1 command from the child process, and I have to run two or more processes from the child.. which means I might have to revert back to using system()) what is the best approach to return 2 or more return codes to the parent? I am wondering if I need to use a global hash or an array to keep track of return codes and delete them after they get inserted into the database.

Another approach would be to parallelize based on the external processes that needs to be run. Another words, have @host_run_command array and iterate through that to fork child processes on that, but that could mean that I would have double or triple the number of parallelized processes if there are two or three external commands to run per host. I'd like to keep parallelization down a bit and run the command-per-host in serial.

We are talking up to 500 hosts here, or more.. Which could mean 1500 child processes if there are 3 commands to be run per host, run every few second intervals.

Is Parallel::ForkManager the right tool for this? Maybe POE?

That leads me to my second question..
What's the behavior of Parallel::ForkManager if I exceed the number of max processes I've defined? For example, if I've instantiated:

my $pm = new Parallel::ForkManager( 20 ); for my $host (@hosts) { $pm->start($host) and next; exec(@some_command); print(STDERR "exec failed: $!\n"); _exit($!); }

and there are 50 hosts in the @hosts array, then does Parallel::Forkmanager stop forking the 30 additional processes at all or will it wait until the first 20 finishes and then spawn the next 20, and then the next 10?

I have a feeling that it's the former.. and that I need to put checks in place to not exceed the limit, but I have not been able to verify this in the documentation or online anywhere.

Thanks for your feedback and patience. This forum has been invaluable, and I've learned a lot so far.

Replies are listed 'Best First'.
Re: Forking child processes.
by ikegami (Patriarch) on Sep 13, 2009 at 01:17 UTC
    $pm->run_on_finish(sub { my ($pid, $exit_code, $ident) = @_; my ($action, $host) = $ident =~ /^(.*?) on (.*)/s; printf "run_on_finish: %s (pid: %s exited with code: %s\n", $ident, $pid, $exit_code; insert_into_db(\$dbh, $host, $action, $exit_code); });
    plus
    for my $host (@hosts) { if (!$pm->start("cmd1 on $host")) { exec(@cmd1, $host); print(STDERR "cmd1 exec failed: $!\n"); _exit($!); } if (!$pm->start("cmd2 on $host")) { exec(@cmd2, $host); print(STDERR "cmd2 exec failed: $!\n"); _exit($!); } }
    or
    for my $host (@hosts) { for ( [ cmd1 => \@cmd1 ], [ cmd2 => \@cmd2 ], ) { my ($action, $cmd) = @$_; $pm->start("$action on $host") and next; exec(@$cmd, $host); print(STDERR "$action exec failed: $!\n"); _exit($!); } }

    I guess you could also pass a two element array for ident instead of building and splitting a string.

      thanks, I am using slightly modified version of the last snippet of code, but it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

      The processes are definitely spawned and the external commands are run, and I can see the output on STDOUT. I am spawning about 50 processes at a time, every 10 seconds. More precisely, I am spawning 1 process every 10 seconds, which then in turn forks 50 processes to do the work.

      If run_on_finish() is never called, then I would think it's due to some programmatic error, but it seems to be sporadic. Which of course doesn't rule out error on my part, but probably a more subtle error that has left me a little puzzled.

      Here's the simplified code structure:

      $SIG{'ALRM'} = 'sig_handler'; setitimer(ITIMER_REAL, 1, 10); while (1){} sig_handler{ if ($pid = fork){ }else{ my $MAX = 1000; # just for testing my $pm = new Parallel::ForkManager($MAX); my $dbh = DBI->connect("DBI:mysql:database=xxxxxx;host=SOME +HOST", "user", "passwd", {'RaiseError' => 1}); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; insert_result(\$dbh, $host, $check_id, $exit_code); } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; for my $check_id (@{$checks{$srv_id}}){ $pm->start("$check_id on $srv_id") and next; exec(@{$commands{$check_id}}); print(STDERR "$check_id on $srv_name exec failed: $!\ +n"); _exit($!); } } $dbh->disconnect(); exit(); } }

      Please let me know if you have any suggestions. The reason that I am spawning a child to call Parallel:FM, is that I want to ensure that a process is spawned exactly every 10 seconds to do the work of spawning processes to run on process on X number of hosts, where X can grow depending on what it can scale to (with some experimenting). thanks.

        • while (1) {}
          is a wasteful way of doing
          sleep while 1;

        • You never collect the child processes you create in the outer parent. You'll accumulate zombies and run out of resources.

        • Using signals adds fragility, and they're not necessary here.

        I'd try the following:

        #!/usr/bin/perl use strict; use warnings; use POSIX qw( _exit ); use Time::HiRes qw( sleep time ); # Optional. use constant PERIOD => 10.0; sub tick { my $pm = ...; ... } sub sleep_till { my ($sleep_till) = @_; for (;;) { my $duration = $sleep_till - time(); last if $duration <= 0; sleep($duration); } } { $SIG{CHLD} = 'IGNORE'; # Autoreap children my $time = time(); for (;;) { $time += PERIOD; sleep_till($time); my $pid = fork(); if (!defined($pid)) { warn("fork: $!\n"); } elsif (!$pid) { $SIG{CHLD} = 'DEFAULT'; _exit(0) if eval { tick(); 1 }; print STDERR $@; _exit($! || ($?>>8) || 255); } } }

        it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

        As I see it,

        • either the child was never started in the first place,
        • P::FM wasn't told its child ended (signal handling problem?), or
        • you're mistaken about run_on_finish not getting called.

        It could also be a bug in P::FM, but I seem to remember the code being simple and straightforward.