sickboy has asked for the wisdom of the Perl Monks concerning the following question:

$n=0; @array=("one", "two", "three"); foreach $line (@array) { #1 $pid=fork and next; &ret; } print "$a\n"; #2 print "Task done\n"; sub ret { print "$line\n"; #3 $a="something\n"; exit(); }
I am a couple days into this fork thing, so speak slowly.
1. Is this an ok way to fork off a proccess? If I had a couple thousand records, would this exhuast an average PC?
2. how can i wait here till all of the elements of the array have been passed thru the sub ret?
3. how do i store this for use in the parent procces.

Replies are listed 'Best First'.
Re: basic fork question
by gav^ (Curate) on Jan 19, 2002 at 06:07 UTC
    If you use Parallel::ForkManager you can write:
    use Parallel::ForkManager; my $pm = new Parallel::ForkManager(30); foreach my $line (@array) { $pm->start and next; # process here $pm->finish; } $pm->wait_all_children;
    All you have to do is decide on how many children you want :)

    gav^

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: basic fork question
by stephen (Priest) on Jan 19, 2002 at 06:53 UTC
    (I've been out of the game for a bit, so all of this goes into the "if I recall correctly" category)

    Whenever you do a fork() call, the system:

    1. Creates a new system-level thread.
    2. Makes a duplicate of your existing process.
    3. Does some other housekeeping.

    This is an oversimplification, and as such is totally inaccurate. :) Even so, it shows that every time you do a fork(), it costs time and (frequently) memory. Also, if you're processing a bunch of items and have to wait until they're all done before you move on to the next step, it's frequently no faster to fork off subprocesses than it is to do the whole thing step-by-step. (This is especially true if you're on a single-processor machine-- multithreading won't do it faster. It will just make all of your processes fight for time on the processor.)

    This is only my opinion, but I would REALLY not fork off one process per record. Whether it would bog down an average PC would depend on how processor-intensive the task is.

    If forking off background processes still looks good, it's generally a good idea to decide on a maximum number of threads, create them as necessary, and parcel out tasks to them as appropriate. I'll leave exactly how to do that to you. :)

    By "how do I store this for use in the parent process", I think you mean "How do I get the value of a changed variable in a child process back to the parent process?" The answer is, not directly. Your child process is running in a completely different Perl interpreter, so there's no way to pass a variable. If you want to write a result back to the parent process, you'll need to write it via an open filehandle or named pipe. Look at the perlipc manpage for details.

    The function you're looking for to wait until all the child processes are finished is the wait() function. You'd do it like so:

    use strict; sub ret($); MAIN: { my @array=(0 .. 100); foreach my $line (@array) { my $pid = fork and next; ret($line); sleep(1); exit(); } wait(); print "Task done\n"; } sub ret($) { my ($line) = @_; print "$line\n"; }

    Trying to run it from 1 to 1000 bombs matters out at around 571 with this script, so once again, forking off 1000 children is a Bad Idea. :)

    Good luck!

    stephen

      I manipulated this example code for my own benefit, and found that even if I move the exit(); statement out of the foreach loop, my script exits right after forking. More exactly, it allows the forked routines to progress on their own, and then finishes the script, then exits. How do I make the script wait for the forks to finish before proceeding?
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: basic fork question
by jonjacobmoon (Pilgrim) on Jan 19, 2002 at 06:56 UTC
    While ^gav's response is right on the money, You might want to try out some experiements with forking on your own before falling back on the module. Have you read the "fork" description in the camel book or the perlfunc doc's? If you are just getting started on understanding how they work you might start there.

    To answer your questions directly:

    1. You want to check for $pid before forking

    # mostly from the camel book if ($pid = fork) { .... #parent } elsif (defined $pid) { ... #child exit; }

    But, if your forking creates thousands of processes at once you will likely overload your system and get locked out. As long as you have an exit in the child, you should be okay as far as this is concerned.

    2. To make the parent wait until all the child are finished you use a sigalert. I am not an expert on the syntax here, maybe others can add to that, but basically you can set an alert to have the parent wait until the child sends a signal to continue.

    3. I am not sure I understand your third question. Store what? The array? The parent already has the array. Can you clarify?

    UPDATE: Marcello is right, of course. I didn't mean my code to be usable but just a cursory example. I hope I didn't mislead anyone.


    I admit it, I am Paco.
      I would like to suggest a better way of handling a fork:
      if (!defined($pid = fork())) { # FAILURE # parent process } elsif ($pid == 0) { # SUCCESS # child process } else { # SUCCESS # parent process }
      This one is more fail-safe, since you omitted the situation of a fork() failure.
    A reply falls below the community's threshold of quality. You may see it by logging in.