Many people handle problems differently based on there background. For interprocess communication, I like databases. After all with a good database someone has already written the server and row locking code. I'm assuming you have some staging area that files goto first. Then they are copied to specific folders. Then some process on the target machine works with that file.

So I would store each files name and size in a database. Along with that information you could store the machine ID of the machine in charge of that file.

Now you move the distribution burden to each process itself. This is nice because you no longer need a way for them to communication directly. Here is a model of how this would seem to work with your situation.

FileX arrives. It is entered into the db ("FileX", 1123, 0) (where 1123 is the size and 0 is the 'owner' or process responsible. The recievers job is now complete.

Each of your servers is then constantly running a little script that does the following. Query the sum of files all other machines are processing. If it is the lowest then attempt to change the ProcessID on the files database entry. If it succeeds then it is now the owner. If it fails them some other machine grabbed it already. UPDATE files SET ProcessID = ? WHERE id = ? AND ProcessID = 0; This is a nice operation because it only assings the file to your process if the ProcessID is still 0.

Now the process can move the file into its folder and preced on its marry way. You could incorporate CPU load in there when its deciding if it wants to grab the file. If its over X CPU then wait Y seconds until deciding if you want to grab a file agian.

Benefits of this are: the server is already written, the locking is already written. Storing the files sizes means you only ask a file its size once and summing folders becomes an easy (SELECT sum(file_size) GROUP BY ProcessID WHERE ProcessID <> 0). It also means you could add or remove servers at will without changing any code at all.

I hope that might give you some insight into another way to look at the problem even if you stick with your current.


___________
Eric Hodges

In reply to Re: distributed computing by eric256
in thread distributed computing by Win

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.