Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Looking for a resource management / job queue module

by Marshall (Canon)
on Jul 21, 2012 at 15:33 UTC ( [id://982990]=note: print w/replies, xml ) Need Help??


in reply to Looking for a resource management / job queue module

This could get to be pretty complicated if you require a fancy scheduling algorithm - but it could be that something fairly simple would work well enough to get started.

It sounds like you already have the concept of a central "cop" program that starts these various tests and you need a resource manager for it to keep track of the resources.

You could use a DB to track resources, but one common way is to create a series of zero or one byte files, each file representing one of the resources. The resouce is in use if the "cop" program can acquire an exclusive lock (write lock) on the file. Release the lock when the test is over. If the "cop" program dies, all the locks are released (a file lock is a memory resident structure - not something on the disk). This way you don't have to clean up a DB on a restart.

Your Perl program keeps a table of who is using what. The hard bit is say test1 uses a couple of resources, test2 needs them all, test3 uses couple of the resources (although different ones than test1). If you want test1 and test3 to run in parallel, and then run test2, that requires "more smarts" than just running down the queue sequentially and waiting until resourses are available for the next test. If the queue order was different (test1 test3 test2) then a simple algorithm would run 1,2 together and then run 3 once both 1 and 2 had finished. How "smart" the scheduler needs to be depends upon the job mix and other factors (like how important maximal efficiency is and how long these various tests run). Maybe some of the tests that only need a couple of resources run a long time and the one that needs them all is fast - I don't know.

Sorry if this wasn't much help, but maybe you will get some ideas. You could "roll your own" simple manager and just see how well (or not well) it works out in practice. The job queue could just be a "drop directory" with files that describe the jobs. Try FIFO first and see how it works out. Increase complexity as needed.

Sorry that I am not aware of a CPAN module that would do this all - but that doesn't mean that such a thing doesn't exist! Maybe there is some way so that your simple resource control's simple "enough resources now, y/n?" can be combined with an existing module. I presume that would have the effect of running jobs that require fewer resources at a "higher priority" than ones that require more? Any way I recommend starting simple and measuring how well it works.

"reserving" some of the resources in advance without being able to acquire all the resources at the same time can lead to "deadlocks". Sorry if I wasn't more help. The general problem for maximal efficient use of resources is difficult (at least for me). But I am hoping that something simple will "move the ball forward" and perhaps even allow developer's to inject other tests into the nightly run's mix of regression tests (software folks are known "night owls").

Replies are listed 'Best First'.
Re^2: Looking for a resource management / job queue module
by elTriberium (Friar) on Jul 23, 2012 at 17:47 UTC

    Thanks, this was helpful. I'm thinking about writing this myself, but there are a lot of corner cases to take care of (what if a resource goes down / is reserved by someone else? What if a job never finishes? What if I need to scale this up and support multiple "job submit nodes"?) That's why I was hoping for an existing solution.

    There are a lot of Grid schedulers (Condor, Sun Grid Engine forks, Torque, etc.), but the problem I see with most of them is that they operate under the assumption that they control the actual jobs and start / stop the individual processes. That's not the case in our environment where we already have the "control job" (basically a customized version of the TAP::Parser module).

Re^2: Looking for a resource management / job queue module
by renormalist (Sexton) on Oct 23, 2012 at 14:14 UTC

    It sounds like the perfect use-case for Tapper.

    There we have a scheduler that maintains HOSTS and QUEUES. Queues usually mean a test use-case (like "linux-stable", "linux-rc", etc.). You put test requests into a queue inclusive some "requested host features" spec, let the scheduler decide which queue next to choose per bandwidths and available hosts. Test requests can "re-queue itself" to create a continuous rotation of the use-cases.

    Setting up Tapper with all features (as used in the OSRC where we set up machines from scratch to with other distributions and Xen/KVM setups) can be a bit tricky but you seem to be ok with using ssh.

    See http://renormalist.net/misc/ for public material about it.

    Tell me if you already found another solution. Else I could help you set up a Tapper instance step by step.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://982990]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2024-03-28 09:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found