distributed processing

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hello, I a perl function that collects a lot of data and then processes it and sends a report. The problem I have is that the data processing happens on one server and its taking a lot of time (20 min) to processes the entire data. Is there any way I could use other idle servers to distribute data processing, so that the same function could process the data on 2-3 different servers instead of one server. I have solaris 10 servers

Comment on distributed processing

Replies are listed 'Best First'.
Re: distributed processing by Corion (Patriarch) on Jul 21, 2009 at 13:50 UTC
In the easy case that data distribution is no problem for you, because all machines access the data on the same (nfs or NAS) share, simply running the program on the different machines through runN and `ssh -c` is likely the easiest solution. This doesn't give you the fancy job status overview, throughput charts or automatic load balancing or job restarts, but on the other side, it's just a script and the effort of setting up passwordless keys to the other machines. Alternatively, you could look into what bashreduce does, and consider how to adapt that for your case, or look into the Perl modules for GRID or SSH, or even a job queue like Gearman or TheSchwartz.	[reply] [d/l]
Re: distributed processing by jettero (Monsignor) on Jul 21, 2009 at 14:49 UTC
I asked about this a couple years ago and got a lotta good answers. Nothing satisfactory though... But I did find a ton of tools to play with. Also, POE rocks. -Paul	[reply]
Re: distributed processing by JavaFan (Canon) on Jul 21, 2009 at 13:42 UTC
I a perl function Next step, and it's "I robot"? Is there any way I could use other idle servers to distribute data processing, so that the same function could process the data on 2-3 different servers instead of one server. That depends on the problem. Sometimes the answer is yes. It'll require some coding - both for a frame work to distribute data and collect and combine the answers, and, possible, to change your algorithm to work with just part of the data.	[reply]
Re: distributed processing by Zen (Deacon) on Jul 22, 2009 at 13:41 UTC
Have you run Devel::DProf (or otherwise profiled) your existing code? Before you throw hardware at a problem, you should see if you can write better algorithms.	[reply]