Re: Clustered Perl Applications?
by tachyon (Chancellor) on Jul 05, 2003 at 03:44 UTC
|
MySQL is a pretty robust data storage solution +/- Storable to serialize data if required. You can connect to it remotely over TCP/IP from you number crunching machines and read and write data at will. As a bonus just about all the code you will need is written and has been thoroughly tested. It wraps the storage and transport into one well tested package.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] |
|
|
I'm already using MySQL for storage and also for queuing on the number
crunchers...
The main problems with Storable are that I have to add some sort of protocol because every host has different functions which must be triggert and it is nearly impossible to use it from other languages.
| [reply] |
|
|
If you can't coerce your data into CSV format (which is after all native RDMS format) just write a serialization class in all the langs you need. Alternatively look at YAML or XML as the storage/transfer format....
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] |
Re: Clustered Perl Applications?
by dws (Chancellor) on Jul 05, 2003 at 06:37 UTC
|
I'm working on a perl application which should run on multiple small hosts, some for storage and some for number crunching. ... Now I'm searching for a clean and fast solution to interconnect these hosts.
"It depends". What do you mean by "interconnect"? Do you need to coordinate number crunching (say, by handing out subtasks from some central server)?
I've work on one load-balanced Perl application server that shared heavyweight data via the database tier (Oracle, in this case), with lightweight "event" propagation via sockets between the servers. The lightweight part would correspond to "some sort of home brew binary/ASCII protocol" on your list. It worked fine for us.
But to answer "what are your thoughts on my problem?" we would need to know more about your problem. Can you characterize the nature of the number crunching? (E.g., is the crunching coordinated between servers? At what level of granularity?) The nature of storage? (E.g., are stored computations shared between servers, or is storage write-only?)
| [reply] |
|
|
The number crunching is coordinated by a central node.
It is just handing out job ids, the number crunchers are then fetching the data from the right storage server (storage servers hold ranges of jobids), crunching the data and sending it to another storage server where the data is saved for later analysis.
Storage servers are MySQL and always readwrite.
| [reply] |
|
|
So the number-crunchers are fetching/storing data (using SOAP) from storage servers, which are fetching/storing the data from MySQL servers - or am I misunderstanding?
Why do you need the intermediaries? Why not have the number-crunchers fetch/store directly to the MySQL servers?
(apologies if I'm being dim - rather late here.)
| [reply] |
|
|
Re: Clustered Perl Applications?
by IlyaM (Parson) on Jul 05, 2003 at 12:24 UTC
|
Other alternatives:
Lot of interesting links on web services can be found on this page.
Currently I'm working on the project which involves distributed web services and after carefully examining SOAP and REST we have decided to go REST way. One of motivations is that SOAP is extremly blotated standard which adds a lots of complexity which in most cases doesn't solve problems which cannot be solved REST way. Do not be fooled by simplicity of SOAP::Lite API - read specs to get idea how complex is it.
--
Ilya Martynov, ilya@iponweb.net
CTO IPonWEB (UK) Ltd
Quality Perl Programming and Unix Support
UK managed @ offshore prices - http://www.iponweb.net
Personal website - http://martynov.org
| [reply] |
|
|
I know what a monster SOAP is.
But I have to send big amounts of structured data in both directions.
And as far as I know, this is not a strength of REST.
| [reply] |
|
|
Hmm, this is an ideal application for REST - your data can be converted to just XML. This is parsed much faster than SOAP (despite that also being XML) because it's 1 less layer to go through.
| [reply] |
|
|
|
|
|
|
|
Re: Clustered Perl Applications?
by naChoZ (Curate) on Jul 05, 2003 at 03:44 UTC
|
You may take a click over to http://www.distributed.net and check out their way of doing things. I participated in the rc5-64 challenge for a few years. IIRC, I'm pretty sure the details of their operation is available for reading somewhere on the site.
~~
naChoZ
| [reply] |
|
|
But the problems distributed.net are working on are very
specialized problems - they are the problems were nodes
need no communication with each other, except for exchanging
some information with a central node, but even that is not
very much. Communication between nodes is certainly not a
limiting factor for their kind of problems, but it appears
to be the case for the OP.
Abigail
| [reply] |
Re: Clustered Perl Applications?
by jepri (Parson) on Jul 05, 2003 at 11:47 UTC
|
There is a great protocol called BEEP (it's an RFC), which would
do a lot of what you want. Unfortunately it is a C lib
and hasn't been wrapped for perl yet, although I've been
thinking of doing it myself.
I wrote an IPC module that works through network sockets and
could easily be adapted for full network use. It transfers
data stuctures using Storable, and they appear at
the receiver without any work. It's on my perlmonk page, or
message me for more details.
In any case, you will be using Storable, since it is an
amazingly good module that serialises perl data structures.
I usually advise against chucking everything into a database.
There are lots of great reasons for using a database, but
it should be a considered decision, rather than the first
thing you reach for. They have some great features like
automatically working with multiple clients while insuring
data integrity, but they are also slow compared to a solution
that takes into account how different nodes are going to use
the data.
____________________
Jeremy
I didn't believe in evil until I dated it. | [reply] |
|
|
| [reply] |
|
|
Yeah, they all make a big thing about how it works, but then they are all nerds.
BEEP is just yet another way to shove data from one app to
another. It has nothing to do with SOAP, except that the
promiscuous SOAP team appear to got into bed with yet another
transport protocol.
BEEP doesn't care what data you send. If that data is SOAP
data, great. But feel free to send your own data, using
Storable or pack(), or XML. It's all the same in the end.
____________________
Jeremy
I didn't believe in evil until I dated it.
| [reply] |
Re: Clustered Perl Applications?
by sgifford (Prior) on Jul 05, 2003 at 04:27 UTC
|
What is slow about what you're doing now? Is it packing/unpacking the data, or transferring the data across the network, or loading the SOAP modules, or...?
Doing a little profiling to figure out exactly what you're trying to improve could save you a ton of time.
| [reply] |
|
|
As a side-note, I would add that the SOAP::Lite implementation of the SOAP interface is very slow - There was recently a journal entry on http://use.perl.org that linked through to a very comprehensive comparison of SOAP interfaces between different languages and platforms. Despite its ease in use, SOAP::Lite fared very poorly in these tests with great latency in response and throughput time.
I just wish I could find the link or journal entry so that I could link to it from here ... :-(
Update (2004-03-11) - Found the link - http://www.caip.rutgers.edu/TASSL/Papers/p2p-p2pws02-soap.pdf
perl -le 'print+unpack"N",pack"B32","00000000000000000000001001101111"'
| [reply] [d/l] |
|
|
I had seen that journal entry too, and I have to agree.
The bottleneck seems to be the packing/unpacking part, according to my little profiling. With both XML::Parser and XML::Parser::Lite.
| [reply] |
Re: Clustered Perl Applications?
by cleverett (Friar) on Jul 05, 2003 at 04:27 UTC
|
| [reply] |
Re: Clustered Perl Applications?
by tmiklas (Hermit) on Jul 05, 2003 at 20:33 UTC
|
I would suggest POE. I've used it once for implementing distributed data analysys for an irc bot - everything written some time ago as a proof of concept and it worked (woah!) until i've lost it during system crash (so where's my backup?) :-)
Back to POE - especially the POE Cookbook is worth reading...
Greetz, Tom. | [reply] |
|
|
Until now I've not much heared about POE, what's the big deal with it?
I took a look at the cookbook and the documentation but to me it just looks like a very bloated perl application server framework thingy, please correct me if i am wrong.
Are there any reference projects using it out there?
I personally like to use apache/mod_perl for such things.
| [reply] |
|
|
Well - I wouldn't describe it as "very bloated". It's a very flexible system that allows you to throw together complex servers quickly. There are many organisations and projects using it successfully.
So consider yourself corrected :-)
Apache/mod_perl is nice - its what I use myself most of the time - but not all applications fall easily into a stateless HTTP request/response framework.
| [reply] |
Re: Clustered Perl Applications?
by scrubroot (Novice) on Jul 08, 2003 at 14:57 UTC
|
If you are not tied to the idea of having Perl handling the actual inter process communication, there may another option that might prove useful: openMosix. Here's some text from the web site that best describes what it does:
"…openMosix is a Linux kernel extension for single-system image clustering. This kernel extension turns a network of ordinary IA-32 computers into a supercomputer for Linux applications…."
"…There is no need to program applications specifically for openMosix. Since all openMosix extensions are inside the kernel, every Linux application automatically and transparently benefits from the distributed computing concept of openMosix. The cluster behaves much as does a Symmetric Multi-Processor, but this solution scales to well over a thousand nodes which can themselves be SMPs."
In theory, all you need to do is write your Perl code so that it forks off a process to do the number crunching and then return the data when it finishes, the kernel patch will take care of locating and moving the thread to an available machine for you. Processing nodes can be added or removed from a cluster without requiring effecting running processes (other than speed). The only down side to this is that this only works with Intel compatible processors and Linux, so this can't be run on a Sun box. However, it is FREE, and may be able to improve the performance of your existing Perl code with little or no modification.
More info can be found on the web site here: http://openmosix.sourceforge.net/
Rick
| [reply] |