Re: Queuing in multithread context

Reading this thread (no pun intended) through several times now, I still don’t understand how threading should enter into the picture ... at all. Threading is the sub-division of a single process, on a single computer. You are using a cluster of computers to work on a massive copy-job so that you do not overtax the I/O resources of any one machine. (Is my understanding correct, so far?)

If so, it appears to me that this work should be performed by instances of a single-threaded worker process (one or more per machine, depending on how “beefy” that machine is). They could simply be run from a Unix command-line, e.g. as background-jobs. And so, now, “the problem to be solved,” in a flexible and dynamic way, is: “what am I supposed to do next?”

If the right-answer can be computed in advance, you could literally construct a shell-script consisting of one or more executions of this copy script ... dare I say, even an rsync command? ... and send it to each server so that it may execute it. Each server just runs through its script without considering anyone else, and when all of them are done, you’re done. The configuration file allows this process to be varied, but all scheduling decisions are made in advance. “Here are your instructions for today ... now, go do them.”

Another approach would be to build these processes so that they open a socket-connection to some “grand marshal” process on a single machine, i.e. this being the one which has read the configuration file. They ask the grand-marshal what it is that they are to do next, and what tell him what they have done, just by sending and receiving single text-lines through a socket. The grand-marshal uses nothing more than a select() loop to manage all of the connections. (Other, higher-level network piping options are also available, but the difference is irrelevant to my suggestion.) The grand-marshal is single threaded, processing a stream of messages sequentially and sending back new orders to each worker until, by closing the connection, it indicates that the job is done. This would only be called for if the nature of the job (the hardware configuration, etc.) warrants dynamic, on-the-fly marshaling decisions.

Replies are listed 'Best First'.
Re^2: Queuing in multithread context by Hardin (Novice) on Jan 20, 2015 at 16:18 UTC
Hi and thank you for your answer, I think you misinterpreted my description, I think I lost a lot of people on that haha. No cluster or anything to dispatch heavy copy load, I basically need to execute sequentially several copy tasks on several servers in parallel. I think this is a more straightforward description of what I'm trying to achieve. I think I dropped confusion while talking about I/O and loads, as basically all I want is that the copy jobs are executed one after the other on each server and not all at the same time as my first pattern was achieving Hope this is more clear ? :)	[reply]

Replies are listed 'Best First'.

Re^2: Queuing in multithread context
by Hardin (Novice) on Jan 20, 2015 at 16:18 UTC

Hi and thank you for your answer,

I think you misinterpreted my description, I think I lost a lot of people on that haha.

No cluster or anything to dispatch heavy copy load, I basically need to execute sequentially several copy tasks on several servers in parallel.

I think this is a more straightforward description of what I'm trying to achieve. I think I dropped confusion while talking about I/O and loads, as basically all I want is that the copy jobs are executed one after the other on each server and not all at the same time as my first pattern was achieving

Hope this is more clear ? :)

[reply]