P0w3rK!d has asked for the wisdom of the Perl Monks concerning the following question:

In the depths of Perl Wisdom I ask this great Monastery to endow me with the knowledge of complex process management. Now, there is proper forking(), which Ant seems to know a lot about, and there is what I am trying to do.

Can anyone elaborate on their experiences with forking() around with Perl in distributed cross-platform systems environment?

Say you're given 100 processes that you have to fork(), but the order in which they finish does not necessarily run 1..100, but more or less varies on a daily basis based on resources, number of machines to run scripts on, etc. Keep in mind that some of the processes may be dependent upon each other.

What is the best approach that my fellow brothers have taken to tackle this problem?

I am not looking for a solution, but more or less your adventure notes on the pitfalls of this topic.

My current solution is using a type of pooling mechanism to see what's done and what's not. Performing the management of processes in a sequential fashion (ie. system) will solve my problem, but I am trying to run things in a parallel mentality (ie. exec...but not exactly--->looking for alternative).

-P0w3rK!d

Replies are listed 'Best First'.
Re: Caught Forking() around again
by BazB (Priest) on May 17, 2002 at 17:54 UTC

    There are several options, some of which were pointed out in this recent node

    I'll go on about forking()..
    You can fork() then exec() if you want to run external commands, or fork() and run a block of code if you want to do everything in Perl.

    Parallel::ForkManager is a nice module which may help fork and track tasks to be run in parallel.

    Hand rolling your own forking and process tracking subroutines/modules might be other option - co-ordinating a number of dependent tasks is not easy.
    If you can perform a number of tasks, wait for of all them to finish, check the statuses, then execute another block of parallel tasks, it's not impossible, just takes a bit of thinking :-)

    If you want to know when a child process has finished, you could either:

    • Have the parent start all the children, then loop calling waitpid() in a non-blocking manner (see the perldoc)
    • Fork the children and use wait() or waitpid() to reap all of the children, then continue processing.
    • Use a signal handler to catch SIGCHLD signals (which tells the parent one of it's children have finished) then reap the processes - see perldoc sigtrap
    There are probably other ways as well.

    My personal solution was to roll a module to handle fork()'ing and another to dole out tasks, track their return codes, and resubmit them to be processed again if they failed.
    I just looped with a small delay to stop the parent eating a lot of CPU time.

    Hope that helps.
    BazB.

Re: Caught Forking() around again
by yodabjorn (Monk) on May 17, 2002 at 21:13 UTC
    I use Parallel::ForkManager for most of my forking these days.
    it setup nice callbacks for Start/wait/finish of children

    heres a snipit:
    # when a childe spawns $pm->run_on_start( sub { my ($pid,$ident)=@_; log_report("Parent Starting Child $pid Sending to $ident"); }); # when a childe finishes $pm->run_on_finish( sub { my ($pid, $exit_code, $ident, $error) = @_; log_report( "Child at $pid completed on $ident with code:$exit +_code"); }); # while we wait.. $pm->run_on_wait( sub{log_report( "Waiting for children ..." );} );
    of course this is pretty simplistic :-) read the docs for more info. I imagine you caould build up some sort of execution chain ? for a task then independentl;y walk that chain. assuming you know that task X requires task A and B.