http://qs1969.pair.com?node_id=291446

Introduction to Parallel::ForkManager

Introduction

The goal of this tutorial is to demonstrate how to use Parallel::ForkManager, a simple and powerful Perl module available from CPAN. Parallel::ForkManager is a simple and powerful module that can be used to perform a series of operations in parallel within a single Perl script. It is especially well-suited to performing a number of repetitive operations on a relatively powerful machine, especially when working on a multiprocessor machine. This module uses object-oriented syntax, if that frightens you then should read some of the Object Oriented Perl tutorials.

Usage

One caveat to using Parallel::ForkManager is that you must instantiate the Parallel::ForkManager object with a number representing the maximum number of processes to fork. Here is an example of the syntax:

my $manager = new Parallel::ForkManager( 20 );

In many cases, this maximum number of processes to fork will also be the actual number of processes forked by your program. In this case, it is very important to choose this number carefully, as forking a large enough number of processes is enough to bring even the mightiest of machines to it's knees. Also, you can change this number later in your program as needed with the following method:

$manager->set_max_procs( $newMaximumProcs );

After instantiating a Parallel::ForkManager object, you can start forking processes using the start method. It is important to also define the point at which the child processes will finish. This is usually performed within a for or while loop, so the syntax will look like this:

foreach my $command (@commands) { $manager->start and next; system( $command ); $manager->finish; };

The line within the for loop is a common idiom used for Parallel::ForkManager, it starts running the command via a forked process and advances to the next command in the @command array. The start method takes an optional parameter named $process_identifier, which can be used in callbacks (see Callbacks section).

Another useful method in the Parallel::ForkManager class is the wait_all_children method. It performs a blocking wait on the parent program that waits until all forked processes have finished.

Callbacks

It is possible to define callbacks to child processes, which are blocks of code that are called at various points of the execution of your processes. There are three forms of callbacks:

Callbacks are defined using the run_on_start, run_on_finish, and run_on_wait methods, which take subroutines (or references to subroutines) as arguments. The arguments provided to the subroutine differ depending on which form of callback you are defining.

Here's an example of the run_on_start method:

$manager->run_on_start( sub { my ($pid,$ident) = @_; print "Starting processes $ident under process id $pid\n"; } );

The arguments passed to the run_on_start sub are the process id of the forked process (provided by the operating system), and an identifier for the process that can be defined in the start method of the Parallel::ForkManager process. You should remember this in case that you don't provide an identifier in the call to start, this will make $ident be undefined and cause the Perl interpreter to complain (if you are using strict and warnings).

Here's an example of the run_on_finish method:

$manager->run_on_finish( sub { my ( $pid, $exit_code, $ident, $signal, $core ) = @_; if ( $core ) { print "Process $ident (pid: $pid) core dumped.\n"; } else { print "Process $ident (pid: $pid) exited print "with code $exit_code and signal $signal.\n"; } } );

This callback prints useful messages upon completion of the process. One caveat is that $ident must be defined in the start method of each process for this to work, otherwise this code needs to be modified.

The run_on_wait subroutine is a bit different. It is called when the Parallel::ForkManager object needs to wait for something, such as waiting for startup, starting, and waiting for processes to exit. It takes both a subroutine (or subroutine reference) and a optional argument $period, which defines the number of seconds to wait before calling the method again. Here's an example of it's usage:

$manager->wait_on_finish( sub { print "Waiting ... \n"; }, 3 );

This example prints its message about every 3 seconds. In the notes for the latest version of Parallel::ForkManager, it says that the exact period of time is not guaranteed and can vary slightly according to system load. If the second argument is not provided, then the subroutine will be called after the appropriate wait during the start and wait_on_children methods.

Bugs and Limitations

These are straight from the Parallel::ForkManager perldoc, three caveats are provided:

Other Resources

One of the most valuable sources of information on this module is the Perldoc formatted, documentation is available on systems that have Parallel::ForkManager installed and from CPAN.