comment on

Heys all,

I'm in the middle of designing/writing a script for work that involves varying a subset of a large set of numbers many times, then feeding the entire set of numbers through a pre-written analysis engine, which takes around 15 minutes to complete its analysis each time. The subset of numbers, and the amount they vary, changes with each cycle of the script.

We have no shortage of computers and/or processors in my workplace, and what I'd like to do is distribute the analysis over more than one computer (well, processor - some computers have multiple) in the department. I've designed a script to do just that, which is below, and involves forking off a child process for each call to the analysis engine, rsh-ing to a different computer and running the engine there.

Note that @data is an array of arrays, each containing the set of numbers to be analysed, and that @computers is a list of computers available to me. This code is untested; I'm sure there's a couple of bugs somewhere - as I said, I'm designing atm, implementing comes later, I hope.

my $runningprocesses;
foreach $element (@data) {
  ++$runningprocesses if my $pid = fork();
  if ($pid == 0) { # in child process
     system("rsh $computers[$runningprocesses % 10] 'analyse @$element
+'");
  exit;
  }
}

while ($runningprocesses > 0) {
  wait; # wait for all the children to return
  --$runningprocesses;
}
[download]

While I'm fairly sure this'd do what I need, it doesn't seem particularly elegant - and I'm wondering if anyone has any suggestions about better ways to either distribute work or to manage child processes. There will be at least ten calls to the analysis engine, and as runtime is limited (the script will be regularly run) distribution becomes something of a necessesity.

Any advice, suggestions, insults, or whatever people want to throw at me would be greatly appreciated. This is my first foray into anything like this (distribution, not forking and certainly not Perl) and I'm sure there's a better way.

Thanks in advance
-- Foxcub

In reply to Child Process Management and Distributed Systems by Tanalis

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.