in reply to Re: Forking child processes.
in thread Forking child processes.

thanks, I am using slightly modified version of the last snippet of code, but it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

The processes are definitely spawned and the external commands are run, and I can see the output on STDOUT. I am spawning about 50 processes at a time, every 10 seconds. More precisely, I am spawning 1 process every 10 seconds, which then in turn forks 50 processes to do the work.

If run_on_finish() is never called, then I would think it's due to some programmatic error, but it seems to be sporadic. Which of course doesn't rule out error on my part, but probably a more subtle error that has left me a little puzzled.

Here's the simplified code structure:

$SIG{'ALRM'} = 'sig_handler'; setitimer(ITIMER_REAL, 1, 10); while (1){} sig_handler{ if ($pid = fork){ }else{ my $MAX = 1000; # just for testing my $pm = new Parallel::ForkManager($MAX); my $dbh = DBI->connect("DBI:mysql:database=xxxxxx;host=SOME +HOST", "user", "passwd", {'RaiseError' => 1}); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; insert_result(\$dbh, $host, $check_id, $exit_code); } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; for my $check_id (@{$checks{$srv_id}}){ $pm->start("$check_id on $srv_id") and next; exec(@{$commands{$check_id}}); print(STDERR "$check_id on $srv_name exec failed: $!\ +n"); _exit($!); } } $dbh->disconnect(); exit(); } }

Please let me know if you have any suggestions. The reason that I am spawning a child to call Parallel:FM, is that I want to ensure that a process is spawned exactly every 10 seconds to do the work of spawning processes to run on process on X number of hosts, where X can grow depending on what it can scale to (with some experimenting). thanks.

Replies are listed 'Best First'.
Re^3: Forking child processes.
by ikegami (Patriarch) on Sep 16, 2009 at 17:26 UTC
    • while (1) {}
      is a wasteful way of doing
      sleep while 1;

    • You never collect the child processes you create in the outer parent. You'll accumulate zombies and run out of resources.

    • Using signals adds fragility, and they're not necessary here.

    I'd try the following:

    #!/usr/bin/perl use strict; use warnings; use POSIX qw( _exit ); use Time::HiRes qw( sleep time ); # Optional. use constant PERIOD => 10.0; sub tick { my $pm = ...; ... } sub sleep_till { my ($sleep_till) = @_; for (;;) { my $duration = $sleep_till - time(); last if $duration <= 0; sleep($duration); } } { $SIG{CHLD} = 'IGNORE'; # Autoreap children my $time = time(); for (;;) { $time += PERIOD; sleep_till($time); my $pid = fork(); if (!defined($pid)) { warn("fork: $!\n"); } elsif (!$pid) { $SIG{CHLD} = 'DEFAULT'; _exit(0) if eval { tick(); 1 }; print STDERR $@; _exit($! || ($?>>8) || 255); } } }

      ikegami, question about the sleep_till() function. I am sure there is a good reason, but why wouldn't you just do:

      sub sleep_till { my ($sleep_till) = @_; my $duration = $sleep_till - time(); if ($duration > 0){ sleep($duration); } }
        Signals interrupt sleep. Mind you, no signals are being handled here, so your solution would work too.
Re^3: Forking child processes.
by ikegami (Patriarch) on Sep 16, 2009 at 21:20 UTC

    it seems like the callback function run_on_finish() doesn't get called for SOME of the forked processes.

    As I see it,

    • either the child was never started in the first place,
    • P::FM wasn't told its child ended (signal handling problem?), or
    • you're mistaken about run_on_finish not getting called.

    It could also be a bug in P::FM, but I seem to remember the code being simple and straightforward.

      I am leaning towards #2. I've verified that all child processes are getting started, and I print ident from run_on_finish(). There are definitely some missing callback routine output. I'll continue to debug and get to the bottom of this. I'll also try your solution which will obviate the need for the sig alarm handler. thanks.

        In order to debug this problem, I've trimmed the code to bare minimum, and I am now trying ikegami's solution. The subroutine, tick(), is now called every 10 second interval without the need for the signal handler.

        However, I am still seeing issues with the callback not being called for every forked process. In this example, only the first process gets the callback. Here is the sample code that I am testing. I am hoping that it's something trivial that I've overlooked. Maybe a good night's sleep will help me find the problem. Please let me know if you see something that I am not doing correctly. Assuming you have the modules installed, this code should run as is.

        use strict; use warnings; use POSIX qw( _exit ); use Time::HiRes qw( sleep time ); use constant PERIOD => 10.0; use Parallel::ForkManager; my @servers = ( { id => "1", name => "srv1", }, { id => "2", name => "srv2", }, { id => "3", name => "srv3", } ); sub tick { my $MAX_PROCESSES = 1000; my $pm = new Parallel::ForkManager($MAX_PROCESSES); my @ping_cmd = ("ping", "testhost", "-s 5", "-c 1"); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; print "^^^^^^^^^^^run_on_finish: $ident (pid: $pid) exi +ted with code: [$exit_code] host: [$host]\n"; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); for(my $i=0; $i< @servers; $i++){ my $srv_id = $servers[$i]{id}; my $srv_name = $servers[$i]{name}; $ping_cmd[1] = $servers[$i]{name}; print "$srv_name:\n"; $pm->start("ping on $srv_id") and next; exec(@ping_cmd); print(STDERR "ping on $srv_name exec failed: $!\n"); _exit($!); } } sub sleep_till { my ($sleep_till) = @_; for (;;) { my $duration = $sleep_till - time(); last if $duration <= 0; sleep($duration); } } { $SIG{CHLD} = 'IGNORE'; # Autoreap children my $time = time(); for (;;) { $time += PERIOD; sleep_till($time); my $pid = fork(); if (!defined($pid)) { warn("fork: $!\n"); } elsif (!$pid) { $SIG{CHLD} = 'DEFAULT'; _exit(0) if eval { tick(); 1 }; print STDERR $@; _exit($! || ($?>>8) || 255); } } }