in reply to controling the num of fork sessions

I don't quite understand how forking your script will permit it to run on multiple boxes. Multiple CPU's, OK, but how do you cause it to run on another system? What you are describing is inconsistent what what you are asking and your sample code.

If you're asking how to keep 20 child processes active at any given time, keep track of them and use waitpid to see how many are gone:

use POSIX ':sys_wait_h'; my $max_children = 20; my $cur_children = 0; $SIG{CHLD} = sub { $cur_children-- while waitpid(-1, &WNOHANG) != -1; &spawn_children; } sub spawn_children { while ($cur_children < $max_children) { my $pid = fork; die "fork: $!" unless defined $pid; &child_process if !$pid; $cur_children++; } } sub child_process { # what the kid does exit 0; # important } &spawn_children;
But like I said, this makes no sense with respects to processes running on other systems. You may have to give us more information if this doesn't help you.

Replies are listed 'Best First'.
Re: Re: controling the num of fork sessions
by mikfire (Deacon) on Nov 27, 2000 at 19:51 UTC
    A nice solution, but I would suggest ( as I almost always do to this kind of question ) a few improvements.

    Why are you going through all this pain with REAPERs and counters? Handle the child's death yourself in the main loop of the code. This obviates some complexity and also gets around perl's rarely ( honestly, I have never seen this interupt problem in about 6 years of perl programming, but that is merely anecdotal ) seen interupt problems.

    What I would suggest ( and have used several times ) is a loop like this:

    while( $again ) { #-- # Initial loop to spawn children #-- if ( $ref < $max ) { if ( $cpid = fork ) { $kids{ $cpid } = 1; } else { #do interesting process here exit; } $ref++; } else { do { select undef, undef, undef, 0.5; $dpid = waitpid( -1, 1 ); } until $dpid; last if ( $dpid == -1 ); # No children left delete $kids{$dpid}; if ( $cpid = fork ) { $kids{ $cpid } = 1; } else { # Same interesting process exit; } } $total++; }
    There is some complexity here - I was using this to seriously abuse some cycles :) The if() portion merely checks to see if all the children have been spawned. If they haven't, spawn another off and log the creation into a hash.

    If all the children have been spawned, poll the system every 1/2 second until one dies. I make sure I have a real pid, dropping out of the loop if not, and remove that entry from the tracking hash. This hash can be used to log when a child has died and what it was doing. Spawn another child off and loop again.

    Because I am handling the deaths myself and not waiting for quasi-mystical signals, even if 100 children die at the same time, each child will remain in the process table until I have processed it. I can then be certain that I will spawn 100 more children off, no matter how or when they die.

    Oh. This loop was actually run as a child process - my loop exitted when the signal handler set $again. You can replace this with a variety of exit conditions - timeouts, all the children have been reaped, etc.

    mikfire

      I was using the signal approach so as to free up the parent process for other things. The use of waitpid in a loop as I was doing should also catch all 100 children if they die simultaneously (under the announcement of a single signal). The only problem might be with unreliable signals. It's just a preference thing.. I would rather not devote a process solely to the task of keeping track of children. Signals (or I guess a more complex event loop) let me do that while allowing me to work on other things at the same time.

      I really don't see it as a pain keeping track of stuff like that. You're using a hash of PID's to keep track of your kids, I was going under the assumption that a count was adequate. TMTOWTDI.

Re: Re: controling the num of fork sessions
by Anonymous Monk on Nov 27, 2000 at 05:26 UTC

    sorry,

    In my example I have a print. In the full .pl I have a telnet function that logs into individual boxes at that point and runs an EXE. Thanks for the info on waitpid. That may solve my problem.

    BRN