comment on

Hi folks

I have been given a task which involves maximizing my computing resources using minimal overhead. For this task, I have a single text file which is to be split into many (up to thousands) of equal-sized chunks, and a single command to be run on them. The command is a small perl script which analyzes the text file's data, and outputs some good stuff into another file (which is an argument to the script)

Now, up until now I have passed the jobs off to Sun Grid Engine as a job array, and life has been good. In this case I cannot do this, but must build my own job manager instead. Here's why:

SGE will not tell me when a job is complete, whether it worked, failed, etc. This in and of itself is not a deal-breaker because I already have handlers built into my SGE-calling code.

SGE overhead - I don't think this should be a consideration, but my boss does not want it either way.

The main point here is that the system must be complete and can be run on networks which do not have SGE or any similar system. That's the biggie.

The quick overview is I have a big file and a script to run it through. I need to make a handler script which breaks it up, throws the individual jobs at a bunch of big servers, gets told when it's done then cats all the output files into one big result file.

My question is this - without reinventing the wheel, does anyone have advice to lead me in the right direction? The closest I've come to a starting point is using RPC calls, not sure if this is the best idea or not. Also, a little more about the system - there will be a main script which will be started on a compute server. Each compute server will be given N jobs at one time, size of N depending on the size of the input file chunks. It is possible but maybe sub-optimal to start one server program on each compute server per job it can handle at any given time. Maybe this can even be done without breaking it into a client/server architecture though I don't yet see how.

Sorry for the wordy description, I will be happy to clarify anything that I can.

-- Thanks, feloniousMonk

In reply to Task distribution project Q by feloniousMonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.