in reply to Re^3: Forking child processes.
in thread Forking child processes.

I am leaning towards #2. I've verified that all child processes are getting started, and I print ident from run_on_finish(). There are definitely some missing callback routine output. I'll continue to debug and get to the bottom of this. I'll also try your solution which will obviate the need for the sig alarm handler. thanks.

Replies are listed 'Best First'.
Re^5: Forking child processes.
by sojourn548 (Acolyte) on Sep 19, 2009 at 03:32 UTC

    In order to debug this problem, I've trimmed the code to bare minimum, and I am now trying ikegami's solution. The subroutine, tick(), is now called every 10 second interval without the need for the signal handler.

    However, I am still seeing issues with the callback not being called for every forked process. In this example, only the first process gets the callback. Here is the sample code that I am testing. I am hoping that it's something trivial that I've overlooked. Maybe a good night's sleep will help me find the problem. Please let me know if you see something that I am not doing correctly. Assuming you have the modules installed, this code should run as is.

    use strict; use warnings; use POSIX qw( _exit ); use Time::HiRes qw( sleep time ); use constant PERIOD => 10.0; use Parallel::ForkManager; my @servers = ( { id => "1", name => "srv1", }, { id => "2", name => "srv2", }, { id => "3", name => "srv3", } ); sub tick { my $MAX_PROCESSES = 1000; my $pm = new Parallel::ForkManager($MAX_PROCESSES); my @ping_cmd = ("ping", "testhost", "-s 5", "-c 1"); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; print "^^^^^^^^^^^run_on_finish: $ident (pid: $pid) exi +ted with code: [$exit_code] host: [$host]\n"; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; $ping_cmd[1] = $servers[$i]{name}; print "$srv_name:\n"; $pm->start("ping on $srv_id") and next; exec(@ping_cmd); print(STDERR "ping on $srv_name exec failed: $!\n"); _exit($!); } } sub sleep_till { my ($sleep_till) = @_; for (;;) { my $duration = $sleep_till - time(); last if $duration <= 0; sleep($duration); } } { $SIG{CHLD} = 'IGNORE'; # Autoreap children my $time = time(); for (;;) { $time += PERIOD; sleep_till($time); my $pid = fork(); if (!defined($pid)) { warn("fork: $!\n"); } elsif (!$pid) { $SIG{CHLD} = 'DEFAULT'; _exit(0) if eval { tick(); 1 }; print STDERR $@; _exit($! || ($?>>8) || 255); } } }

      I think what's happening is that the process that calls tick() is exiting before the three exec(@ping_cmd) has a chance to return (which has the effect of only some of the run_on_finish() are getting a chance to run).

      The callback now runs for ALL child processes by calling $pm->wait_all_childern;

      Since I am waiting on all children, I am probably not creating any defunct processes from tick().. which doesn't really matter since not calling wait_all_children had the effect of exiting immediately and the defunct child processes would have been inherited by init. Is this correct?

      sub tick { my $MAX_PROCESSES = 1000; my $pm = new Parallel::ForkManager($MAX_PROCESSES); my @ping_cmd = ("ping", "testhost", "-s 5", "-c 1"); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; print "^^^^^^^^^^^run_on_finish: $ident (pid: $pid) exi +ted with code: [$exit_code] host: [$host]\n"; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; $ping_cmd[1] = $servers[$i]{name}; print "$srv_name:\n"; $pm->start("ping on $srv_id") and next; exec(@ping_cmd); print(STDERR "ping on $srv_name exec failed: $!\n"); _exit($!); } print "Waiting for all pings...\n"; $pm->wait_all_children; print "Tick finished.\n"; }