Silas has asked for the wisdom of the Perl Monks concerning the following question:

I can find lots of examples on how to set up child process spawning/forking for a client/server model, but I need to set up child process spawning within a script that's run only once.

The basic process is "for each item in a list, script calls a subroutine with that item as input, and returns the corresponding output". I need to simultaneously execute the subroutine for each item in the list.

Using the "spawn" subroutine on page 351 of Programming Perl, I tried this:

my $output;
foreach my $item (@items) {
  &spawn(sub { $output .= &search($item); });
}
This produces, at best, a looping that I couldn't begin to debug.

I know this has been done before - multi-threaded searches, etc. seem commonplace. Anyone have any code snippets?

Replies are listed 'Best First'.
Re: Fork/Spawn
by turnstep (Parson) on Mar 27, 2000 at 22:18 UTC

    I think I know what you are asking, but I'm not sure.

    You want to, for an example, write a spell check program, in which you want each letter to be a child process:

    ##Extremely simple code for $x ('A'..'Z') { if ($pid=fork) { ## This child checks the letter $x exit; } }

    The problem is that you want to "simultaneously execute the subroutine" AND you want the children to return their results in the exact order they were called. There is no simple way to have both: since each child is doing a different task, you have no way of knowing how soon each will finish. You can have the parent wait for the a child to finish before starting the next child, but that's the same as a non-forked loop.

    If the return order does not matter, you can just gather the children's results as they finish. (How exactly you do this depends on a few things: appending to a temporary file may be the easiest way.)

    If the order DOES matter, you must have a way for the parent to identify the children. In the example above, the child could output it's "letter" as the first character of it's result. Then it is up to the parent to sort out the results and display them in a pretty manner.

    Here's some quick code illustrating some of the above:

    $tmpfile = "/tmp/results.$^T$$"; for $x ('A'..'Z') { if (fork) { ## We don't need to know PID in this example $results = &SpellCheck($x); if (open (TMPFILE, ">> $tmpfile")) { print TMPFILE "$x|$results\n"; close(TMPFILE); } exit; ## Done with this child } } ## Wait for all the children to finish, then... if (open (TMPFILE, $tmpfile)) { @results = <TMPFILE>; close(RESULTS); sort(@results); ## Works well for this example!! }

    In the example above, @results will contain the results from all 26 children, sorted in alphabetic order.

Re: Fork/Spawn
by Silas (Novice) on Mar 29, 2000 at 01:58 UTC
    This is helpful; a few more refinements to my question:

    The return order doesn't matter very much; as long as I can pass something unique to each process ($item in my example, $x in yours), I can use that to produce output that identifies who it came from.

    Is there a way to gather the collective output using a variable rather than a temporary file? Is there any good reason not to just append to a string (like $output in my example?)

    Lastly, I need to be able to limit the life of the child process; if the search takes too long or something else happens, I need to be able to kill it off. I can probably find examples of this somewhere, but it would be useful to have an example from someone who already knows my problem space.

    Thanks for your help.

      The best way I know of to do the latter is not specifically documented in perlipc. Use alarm:
      alarm 10; # send me a signal in 10 seconds # do some code which may not complete alarm 0; # if we get here in time, disable the alarm
      If you're going to fork off children, be sure to read the section on SIGNALS in perlipc. You might even do something like this, in your child processes:
      eval { local $SIG{ALRM} = sub { die "child process took too long to compl +ete" }; alarm 10; # do whatever your little heart desires, hope it's fast enough alarm 0; # whew, just made it }; if ($@ && $@ !~ /alarm clock restart/) { # abort the child process gra +cefully } else { # send the parent the information ? }
      Have you looked at perlipc? It has some very useful examples of using pipe and fork to do bi-directional communication between a process and "itself." Another option would be to use socketpair (this is also described in perlipc).

      Yet another option would be to write a client/server app using sockets; you can set up a server that has the list of files to search, then set up some clients that communicate via sockets with the server. The server sends a filename to a client, and the client searches it, then sends back the results over the socket.

      You wouldn't need to use temporary files in any of these implementations.

      Just some ideas. Definitely take a look at perlipc.

Re: Fork/Spawn
by turnstep (Parson) on Mar 29, 2000 at 20:29 UTC

    No, don't just append to a string - you need one of the ways listed above (pipe, server/client) Appending to a string gets very messy quickly. The solution to your problem really depends on how many children you are trying to fork, and how much work each must do. You may not even need to fork at all. The overhead of all the forking, piping, etc. may not be worth the trouble. Try timing the different ways: flat (no children), forked with temp files and few children, piped with many children, etc. Ideally, you want to balance out the work performed by each children, and find the best number of children for the fastest overall result. Also keep an eye on the number of children: if this process (i.e. a search engine for a site) is called many times by different people, thousands of children are not a good idea :)