in reply to Forking child processes.

$pm->run_on_finish(sub { my ($pid, $exit_code, $ident) = @_; my ($action, $host) = $ident =~ /^(.*?) on (.*)/s; printf "run_on_finish: %s (pid: %s exited with code: %s\n", $ident, $pid, $exit_code; insert_into_db(\$dbh, $host, $action, $exit_code); });
plus
for my $host (@hosts) { if (!$pm->start("cmd1 on $host")) { exec(@cmd1, $host); print(STDERR "cmd1 exec failed: $!\n"); _exit($!); } if (!$pm->start("cmd2 on $host")) { exec(@cmd2, $host); print(STDERR "cmd2 exec failed: $!\n"); _exit($!); } }
or
for my $host (@hosts) { for ( [ cmd1 => \@cmd1 ], [ cmd2 => \@cmd2 ], ) { my ($action, $cmd) = @$_; $pm->start("$action on $host") and next; exec(@$cmd, $host); print(STDERR "$action exec failed: $!\n"); _exit($!); } }

I guess you could also pass a two element array for ident instead of building and splitting a string.

Replies are listed 'Best First'.
Re^2: Forking child processes.
by sojourn548 (Acolyte) on Sep 16, 2009 at 16:40 UTC

    thanks, I am using slightly modified version of the last snippet of code, but it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

    The processes are definitely spawned and the external commands are run, and I can see the output on STDOUT. I am spawning about 50 processes at a time, every 10 seconds. More precisely, I am spawning 1 process every 10 seconds, which then in turn forks 50 processes to do the work.

    If run_on_finish() is never called, then I would think it's due to some programmatic error, but it seems to be sporadic. Which of course doesn't rule out error on my part, but probably a more subtle error that has left me a little puzzled.

    Here's the simplified code structure:

    $SIG{'ALRM'} = 'sig_handler'; setitimer(ITIMER_REAL, 1, 10); while (1){} sig_handler{ if ($pid = fork){ }else{ my $MAX = 1000; # just for testing my $pm = new Parallel::ForkManager($MAX); my $dbh = DBI->connect("DBI:mysql:database=xxxxxx;host=SOME +HOST", "user", "passwd", {'RaiseError' => 1}); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; insert_result(\$dbh, $host, $check_id, $exit_code); } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; for my $check_id (@{$checks{$srv_id}}){ $pm->start("$check_id on $srv_id") and next; exec(@{$commands{$check_id}}); print(STDERR "$check_id on $srv_name exec failed: $!\ +n"); _exit($!); } } $dbh->disconnect(); exit(); } }

    Please let me know if you have any suggestions. The reason that I am spawning a child to call Parallel:FM, is that I want to ensure that a process is spawned exactly every 10 seconds to do the work of spawning processes to run on process on X number of hosts, where X can grow depending on what it can scale to (with some experimenting). thanks.

      • while (1) {}
        is a wasteful way of doing
        sleep while 1;

      • You never collect the child processes you create in the outer parent. You'll accumulate zombies and run out of resources.

      • Using signals adds fragility, and they're not necessary here.

      I'd try the following:

      #!/usr/bin/perl use strict; use warnings; use POSIX qw( _exit ); use Time::HiRes qw( sleep time ); # Optional. use constant PERIOD => 10.0; sub tick { my $pm = ...; ... } sub sleep_till { my ($sleep_till) = @_; for (;;) { my $duration = $sleep_till - time(); last if $duration <= 0; sleep($duration); } } { $SIG{CHLD} = 'IGNORE'; # Autoreap children my $time = time(); for (;;) { $time += PERIOD; sleep_till($time); my $pid = fork(); if (!defined($pid)) { warn("fork: $!\n"); } elsif (!$pid) { $SIG{CHLD} = 'DEFAULT'; _exit(0) if eval { tick(); 1 }; print STDERR $@; _exit($! || ($?>>8) || 255); } } }

        ikegami, question about the sleep_till() function. I am sure there is a good reason, but why wouldn't you just do:

        sub sleep_till { my ($sleep_till) = @_; my $duration = $sleep_till - time(); if ($duration > 0){ sleep($duration); } }

      it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

      As I see it,

      • either the child was never started in the first place,
      • P::FM wasn't told its child ended (signal handling problem?), or
      • you're mistaken about run_on_finish not getting called.

      It could also be a bug in P::FM, but I seem to remember the code being simple and straightforward.

        I am leaning towards #2. I've verified that all child processes are getting started, and I print ident from run_on_finish(). There are definitely some missing callback routine output. I'll continue to debug and get to the bottom of this. I'll also try your solution which will obviate the need for the sig alarm handler. thanks.