In the easy case that data distribution is no problem for you, because all machines access the data on the same (nfs or NAS) share, simply running the program on the different machines through runN and ssh -c is likely the easiest solution. This doesn't give you the fancy job status overview, throughput charts or automatic load balancing or job restarts, but on the other side, it's just a script and the effort of setting up passwordless keys to the other machines.
Alternatively, you could look into what bashreduce does, and consider how to adapt that for your case, or look into the Perl modules for GRID or SSH, or even a job queue like Gearman or TheSchwartz.
| [reply] [d/l] |
| [reply] |
I a perl function
Next step, and it's "I robot"?
Is there any way I could use other idle servers to distribute data processing, so that the same function could process the data on 2-3 different servers instead of one server.
That depends on the problem. Sometimes the answer is yes. It'll require some coding - both for a frame work to distribute data and collect and combine the answers, and, possible, to change your algorithm to work with just part of the data.
| [reply] |
Have you run Devel::DProf (or otherwise profiled) your existing code? Before you throw hardware at a problem, you should see if you can write better algorithms. | [reply] |