As for the subtask of running 8 parallel processes per machine, Parallel::ForkManager might be easier to get started with, as it would largely hide the lower-level fork/exec/wait details from you.
(And in case you want to learn how things work under the hood, take a look at the the module's source — it's just ~150 lines of code, and not all that difficult to understand.)
| [reply] |
I definitely agree...
For it's complexity (or so it might seem), Parallel::ForkManager was, I thought, surprisingly easy to get get working. The example code on the module's page was helpful, and the explanation of the methods was more than sufficient. I'd recommend that module as well.
The number of forked processes to run at one time is easily set... maxprocs is a parameter to the object. Just calculate that ahead of time and drop it in. Much faster than doing it manually from the parent process.
my $max_procs = 8;
my $pm = new Parallel::ForkManager($max_procs);
You can also add in run_on_finish for post_process code, and wait_all_children forces the parent to wait for all processes to complete before continuing.
HTH. | [reply] [d/l] |
You could use Parallel::ForkManager, which was designed for such cases.
But are you sure you will get any speed-up with this? If processing a text file is not much work (i.e. just a regex applied to every line for example), your program will spend most of its time waiting for the harddisk. Whether it is doing that in one process or in parallel won't change anything about its speed (if all machines and processes access the same data pool/hard disk)
One way to find out is to let your script run as a single process and measure the time. Then let it run with one single static text (without reload from disk) the same few thousand times. Only the time the second program runs can be reduced by parallelizing.
| [reply] |
| [reply] |
Thanks all for the answers. The pointers were very helpful. I'm still working on it, but I'm beginning to get somewhere.
| [reply] |