axl163 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,
I have come before once again begging for your assistance. I have a shell script that I want to run a x number of times (each run is independent of each other) on a dual-cpu machine. However, the current script only uses one cpu and my goal is to use full processing power. In order for me to run 2 processes (each with half the number of total runs and each using one cpu), is fork() my answer? This is the way my code stands now:
for (my $x = 0; $x < $total_number_runs ; $x++) { exec("shell_script < input_file"); }
If fork() is the answer, would this be the type of structure I would use (as recommended by Roy Johnson)?
for my $x (1..$total_number_of_runs) { my $pid = fork; if (not defined $pid) { die("Unable to create child process: $!\n"); } exec("shell_script < $input_file"); }
Any advice would be greatly appreciated. Thanks,
Perl noob

Replies are listed 'Best First'.
Re: Is fork() my answer to using dual-cpu?
by samtregar (Abbot) on Oct 25, 2005 at 18:52 UTC
    Do yourself a favor, read perlipc. After reading that you should fully realize why you don't want to write your own forking code, so go to cpan and download Parallel::ForkManager. From the sound of things it should make short work of your problem.

    -sam

      This is exactly what I would suggest, with the added comment that I have used this approach with excellent results.

      use Parallel::ForkManager; $pm = Parallel::ForkManager->new(2); foreach my $input_file (@files) { my $pid = $pm->start and next; system("shell_script < $input_file"); $pm->finish; # Terminates the child process }
      Your actual code may vary, depending on requirements

      This works well on both Windows and UNIX, and as long as the OS manages the SMP tasks correctly, your threads will use both CPUs (one thread will only use one CPU, obviously) if needed.

      Updates:

      • 2005-10.Oct-26 : updated code to change exec() to system(). Thanks to axl163 for pointing that out!

      <-radiant.matrix->
      A collection of thoughts and links from the minds of geeks
      The Code that can be seen is not the true Code
      "In any sufficiently large group of people, most are idiots" - Kaa's Law

        Actually, you probably want:

        $pm = Parallel::ForkManager->new(3);

        unless your script has absolutely no disk I/O. Otherwise you will find that you have a processor sitting idle waiting on disk I/O. You might see an improvement with numbers greater than # CPUs + 1, but only in systems with highly parallelized I/O chains.


        The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
Re: Is fork() my answer to using dual-cpu?
by Roy Johnson (Monsignor) on Oct 25, 2005 at 18:35 UTC
    fork is the easy answer, but your forking loop isn't quite right. You will always end up spawning two processes, because both the parent and the child exec within the loop. Try
    for my $x (1..$total_number_of_runs) { my $pid = fork; if (not defined $pid) { die("Unable to create child process: $!\n"); } exec("shell_script < $input_file") unless $pid; #thanks, bluto }
    The parent launches all the children, then exits.

    Caution: Contents may have been coded under pressure.
      You'll want that exec to be something like...

      exec("shell_script < $input_file") unless $pid;

      ...since otherwise the parent will exec as well. FWIW, when I use fork() most of the time I write it in the following "template", comments and all ...

      my $pid = fork(); die "failed to fork: $!" unless defined $pid; unless ($pid) { # child die; } # parent

      ... and then fill in the code later. That way it's harder for me to mess it up. :-)

Re: Is fork() my answer to using dual-cpu?
by duckyd (Hermit) on Oct 25, 2005 at 18:35 UTC
    You probably don't want to exec the shell script in the parent - fork $total_number_of_runs times, and exec only in the children.
Re: Is fork() my answer to using dual-cpu?
by salva (Canon) on Oct 26, 2005 at 20:42 UTC
    use Proc::Queue to limit the max. number of concurrently forked processes:
    use Proc::Queue size => 2, qw(system_back); for (1..$total_number_of_runs) { system_back "shell_script < $input_file" } # and wait for all your children to finish 1 while wait != -1;