Re: distributed computing

Many people handle problems differently based on there background. For interprocess communication, I like databases. After all with a good database someone has already written the server and row locking code. I'm assuming you have some staging area that files goto first. Then they are copied to specific folders. Then some process on the target machine works with that file.

So I would store each files name and size in a database. Along with that information you could store the machine ID of the machine in charge of that file.

Now you move the distribution burden to each process itself. This is nice because you no longer need a way for them to communication directly. Here is a model of how this would seem to work with your situation.

FileX arrives. It is entered into the db ("FileX", 1123, 0) (where 1123 is the size and 0 is the 'owner' or process responsible. The recievers job is now complete.

Each of your servers is then constantly running a little script that does the following. Query the sum of files all other machines are processing. If it is the lowest then attempt to change the ProcessID on the files database entry. If it succeeds then it is now the owner. If it fails them some other machine grabbed it already. UPDATE files SET ProcessID = ? WHERE id = ? AND ProcessID = 0; This is a nice operation because it only assings the file to your process if the ProcessID is still 0.

Now the process can move the file into its folder and preced on its marry way. You could incorporate CPU load in there when its deciding if it wants to grab the file. If its over X CPU then wait Y seconds until deciding if you want to grab a file agian.

Benefits of this are: the server is already written, the locking is already written. Storing the files sizes means you only ask a file its size once and summing folders becomes an easy (SELECT sum(file_size) GROUP BY ProcessID WHERE ProcessID <> 0). It also means you could add or remove servers at will without changing any code at all.

I hope that might give you some insight into another way to look at the problem even if you stick with your current.

___________
Eric Hodges

Comment on Re: distributed computing

Replies are listed 'Best First'.
Re^2: distributed computing by Win (Novice) on Aug 31, 2005 at 14:43 UTC
I like this idea. Does anyone know, off hand, how I can get MS SQL Server to look at current activity in other MS SQL Server databases on a local network?	[reply]
Re^3: distributed computing by eric256 (Parson) on Aug 31, 2005 at 15:28 UTC
You probably only want one database server. The work done by the processes is realy what you want to distribute. So you would have one machine that accepts incoming filse and logs them into the database...Then multiple processes (maybe even one ON the db machine) that check the database and do the work. ___________ Eric Hodges	[reply]