mykbaker has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

The History:
I have a script that processes data. How that part works isn't so important, but currently I have a bunch of instances of this script running on crons across many servers. Each one is given it's own polling time in the cron to wake up, process waiting data, and close.

The Need:
I want to change from the cron based system to a manager style system that will run constantly, notice work needs to be done, find an available server from it's list, and tell that server to do said work.

The Requirements:
I need it to be able to assign multiple jobs to the same server, and I need the manager to be able to run in parallel across a few different servers in case one might go down (the others can sleep as long as they can know when they are needed) and I want it to be in Perl, because that is what the rest of the code is in.

(And finally) The Question(s):
Is anything like this currently available? If not, how would you go about implementing it? If so, what is it? How did it work for you? Success stories?

Thanks!

Replies are listed 'Best First'.
Re: Distributed Computing Anyone?
by kvale (Monsignor) on May 10, 2006 at 04:21 UTC
    One approach is to use OpenPBS a portable batch system. It includes queueing, parallel processing, prioritization, and more. It is open source and works on Unix systems. There is a perl binding PBS::Client that provides a nice interface:
    use PBS::Client; # Create a client object linked to a server my $client = PBS::Client->new; # Describe the job my $job = PBS::Client::Job->new( %job_options, # e.g. queue => 'queue_1', mem => '800 +mb' cmd => \@commands ); # Optionally, re-organize the commands to a number of queues $job->pack(numQ => $numQ); # Submit job $client->qsub($job);
    Alternatively, a pure perl solution could be built around Schedule::Depend. Hmm, it was up to v. 2.6, but doesn't seem to be on CPAN any more...

    Update: Fixed the OpenPBS link.

    -Mark

Re: Distributed Computing Anyone?
by bobf (Monsignor) on May 10, 2006 at 04:51 UTC

    I extended an existing system based on XML RPC to do something similar (see the Frontier::Daemon and Frontier::Client modules for more info on setting up an XML RPC system). Using that approach, you could do something like this:

    • Create a centralized queue server that accepts and queues jobs to be run. A system to track the completion of jobs could be built into this server. In addition, you'll need a system to track which machines are available and which are busy (the max number of running jobs allowed, as defined by you).
    • When a job is completed, an XML RPC message can be sent back to the job tracking server.

    There are probably much more efficient ways to approach this, but the XML RPC system worked well in our case because it fit into our current system (we already had one set up) and it was extremely easy to implement during refactoring. Specifically, data structures can be passed as parameters without having to process and pass them on the command line or writing them to a data file. Those parameters could include data, input/output filenames, the number of machines to run on, the priority of the job, etc.

    If you need more complex communication (IPC, for example), XML RPC is probably not the best choice.

    The gory details:

    I'm sure more experienced and savvy monks will know of more efficient ways to approach this problem. I look forward to reading their responses.

Re: Distributed Computing Anyone?
by superfrink (Curate) on May 10, 2006 at 05:24 UTC
    Distributed::Process might help. To use it you have to define a routine to do all of the work on the client side. The module takes care of the network communication and starting the routine on all of your clients.

    You could write your own client programs using Net::Server. Then it's up to you to write a controlling program that contacts all of your client processes and start the job running.

    If you do not want (or need) the client processes to listen to the network all of the time another way is to write a controlling program that logs into each of the client machines with SSH or RSH and runs a client script.
Re: Distributed Computing Anyone?
by GrandFather (Saint) on May 10, 2006 at 07:45 UTC

    How is the work to be done scheduled and otherwise managed? Do you need to be able to kill tasks once they start executing. Do you need to be able to remove tasks from the queue? Is it sufficient to have a single "reliable" system as the manager or should the management be distributed in some fashion? Do you require task result reporting? Do the tasks have to contend for central resources?

    An "easy" way to do something like this is to use an email server to manage the queue. The task servers then check the queue from time to time and fetch tasks by reading and deleting an email from the queue. Because the email can contain pretty much anything it is easy to tailor tasks and distribute any information that is required for execution of the task.


    DWIM is Perl's answer to Gödel