Re: distributed computing
by philcrow (Priest) on Aug 30, 2005 at 18:25 UTC
|
Many people do this type of work with POE. The idea is that one POE process receives requests or time polls a resource (like the file system). Then it can do what it likes, in your case handing off the work to subservient servers. It's good at networked interactions like the ones you described.
See the perl.com articles: http://perl.com/pub/a/2004/07/02/poeintro.html and its followup http://perl.com/pub/a/2004/07/22/poe.html.
To find file sizes, use stat. That takes time, so you might want to cache results in some way.
Phil | [reply] |
Re: distributed computing
by eric256 (Parson) on Aug 30, 2005 at 20:08 UTC
|
Many people handle problems differently based on there background. For interprocess communication, I like databases. After all with a good database someone has already written the server and row locking code. I'm assuming you have some staging area that files goto first. Then they are copied to specific folders. Then some process on the target machine works with that file.
So I would store each files name and size in a database. Along with that information you could store the machine ID of the machine in charge of that file.
Now you move the distribution burden to each process itself. This is nice because you no longer need a way for them to communication directly. Here is a model of how this would seem to work with your situation.
FileX arrives. It is entered into the db ("FileX", 1123, 0) (where 1123 is the size and 0 is the 'owner' or process responsible. The recievers job is now complete.
Each of your servers is then constantly running a little script that does the following. Query the sum of files all other machines are processing. If it is the lowest then attempt to change the ProcessID on the files database entry. If it succeeds then it is now the owner. If it fails them some other machine grabbed it already. UPDATE files SET ProcessID = ? WHERE id = ? AND ProcessID = 0; This is a nice operation because it only assings the file to your process if the ProcessID is still 0.
Now the process can move the file into its folder and preced on its marry way. You could incorporate CPU load in there when its deciding if it wants to grab the file. If its over X CPU then wait Y seconds until deciding if you want to grab a file agian.
Benefits of this are: the server is already written, the locking is already written. Storing the files sizes means you only ask a file its size once and summing folders becomes an easy (SELECT sum(file_size) GROUP BY ProcessID WHERE ProcessID <> 0). It also means you could add or remove servers at will without changing any code at all.
I hope that might give you some insight into another way to look at the problem even if you stick with your current.
| [reply] |
|
|
I like this idea. Does anyone know, off hand, how I can get MS SQL Server to look at current activity in other MS SQL Server databases on a local network?
| [reply] |
|
|
| [reply] |
Re: distributed computing
by aplonis (Pilgrim) on Aug 30, 2005 at 18:52 UTC
|
Sounds like a job for XML-RPC and the CPAN Frontier module. At least so that the PCs will talk to one another. One PC could run the XML-RPC client and thereby manage any number of other PCs running the XML-RPC server.
There is an O'Reilly book on the topic. If you did it all in pure Perl, it would not even matter which of the PCs were Win32, which UNIX, etc.
I have a somewhat overblown example of the XML-RPC way of doing things recently listed in CUFP. Mine is encrypted and does different things than you want. But you could do something similar by following the book. Or you can steal from mine and just switch out the method calls, as appropriate. I shouldn't think it would be hard. My code is very heavily commented.
| [reply] |
Re: distributed computing
by InfiniteSilence (Curate) on Aug 30, 2005 at 18:17 UTC
|
How can you have code for F without E?
At any rate, here is an article on building a load balacing daemon in Perl (I think this code is for Perl 4 but you should be able to hack it up to snuff). My guess is there is already a ton of better stuff out there, but this was the result of a 12 second search on Google: (e.g. PERL +DAEMON +(load balancing) which you may have tried before the long post ;)
BTW: Here's the author's updated link (add 2 seconds to my previous Google search time).
Celebrate Intellectual Diversity
| [reply] [d/l] |
Re: distributed computing
by Roger (Parson) on Aug 31, 2005 at 17:46 UTC
|
Welcome to the world of super computing! Your computers are linked in a NOW (network of workstations) I presume. Cluster computing with Perl is very easy. Let me explain it...
The ideal configuration for a beowulf cluster is to have a head node (a node is a computer) that has access to the external network, and a private network to link up the rest of the nodes with the head node. The head node will have two network cards, one for the internal cluster, one for communicating with the rest of the world.
C--+--C
|
C--+--C
|
C--Hub--C(head)
|
|
---------+----------- Backbone LAN
This is the ideal world, I doublt that you want to configure your cluster in this mannar.
The easier way is to connect all the computers via the back bone LAN into a NOW (network of workstations), no special hardware is required.
C C C C(head)
| | | |
| | | |
-+--+--+--+---- Backbone LAN
You will need to prepare your systems as follows:
1. have a shared storage area such as NFS mount.
2. have all the computers mount to the same NFS share.
The next step is to install a parallel virtual machine. Check out the project PVM at (Parallel Virtual Machine).
You may want to set up ssh on all the machines to make PVM run better and more secure.
Go to CPAN and fetch the module Parallel::PVM, which is the Perl interface to the parallel virtual machine.
PVM has pretty much everything you want for parallel computing, except for, distributed shared memory (which you really don't need anyway). Your code can be written in manager-worker mode. The manager will launch workers, and the launching process is transparent to your program. PVM will decide which machine the program will run on, depending on the work load.
By the way, the machines do not have to be of the same architecture or running the same OS. It's perfectly ok to run a cluster of linux/solaris/windows mixed nodes.
I am not going to say more here, I hope that you find this information useful, and take joy in doing research into PVM.
| [reply] [d/l] [select] |
|
|
I've been thinking about this problem again and I think that I am making it much more complex than it needs to be.
Each PC that is available to perform an operation with a flat file could dip in at intervals into a single folder (on a network) and then remove the longest present flat file (request file) to the PC in question, so preventing other PCs using it. The process of grabbing only takes place when no other flat file is being processed on that machine. The file taken would be the one that has been in the holding folder for the longest period. The time intervals that each PC attempts to grab a file could be determined by CPU activity, and this would determine the probability of each machine taking on the task. The time intervals in which the holding folder is checked could be set so that it is impossible for two of the PCs to attempt to take a file at the same time, although I have no idea how I would work that out. Can anyone help me with that? 2 or more PCs taking a file at the same time is the only problem I can see with this.
| [reply] |
Re: distributed computing
by samizdat (Vicar) on Aug 30, 2005 at 18:20 UTC
|
I don't know whether this will work with Windows, which it appears to me that you are running, but I would suggest that you start with a solution where one machine acts as a dispatcher and uses the Expect language to tell the others what to do. I would suspect that you can make expect work on Win2K, but not the rest of them. Once you get proficient with expect, you can try a few more tricks. | [reply] |