sri has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm working on a perl application which should run on multiple small hosts, some for storage and some for number crunching.

Now I'm searching for a clean and fast solution to interconnect these hosts.

The data that needs to be exchanged is somewhat complicated, and fits best in structs/hashes but could also be reformatted to csv or the like...

Possibly it should be as standard conform as possible and useable from other languages than perl. (at best)

Currently I'm using SOAP::Lite for everything which runs very smoothly and is comfortable to use but very slow.
I'm using SOAP::Lite's mod_soap so there is not much to tune.

Some other solutions that came to my mind are:
- use of Storable and raw Sockets
- some sort of home brewn binary/ascii protocol
- maybe even Corba

What are your thoughts on my problem?

Replies are listed 'Best First'.
Re: Clustered Perl Applications?
by tachyon (Chancellor) on Jul 05, 2003 at 03:44 UTC

    MySQL is a pretty robust data storage solution +/- Storable to serialize data if required. You can connect to it remotely over TCP/IP from you number crunching machines and read and write data at will. As a bonus just about all the code you will need is written and has been thoroughly tested. It wraps the storage and transport into one well tested package.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      I'm already using MySQL for storage and also for queuing on the number crunchers...

      The main problems with Storable are that I have to add some sort of protocol because every host has different functions which must be triggert and it is nearly impossible to use it from other languages.

        If you can't coerce your data into CSV format (which is after all native RDMS format) just write a serialization class in all the langs you need. Alternatively look at YAML or XML as the storage/transfer format....

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Clustered Perl Applications?
by dws (Chancellor) on Jul 05, 2003 at 06:37 UTC
    I'm working on a perl application which should run on multiple small hosts, some for storage and some for number crunching. ... Now I'm searching for a clean and fast solution to interconnect these hosts.

    "It depends". What do you mean by "interconnect"? Do you need to coordinate number crunching (say, by handing out subtasks from some central server)?

    I've work on one load-balanced Perl application server that shared heavyweight data via the database tier (Oracle, in this case), with lightweight "event" propagation via sockets between the servers. The lightweight part would correspond to "some sort of home brew binary/ASCII protocol" on your list. It worked fine for us.

    But to answer "what are your thoughts on my problem?" we would need to know more about your problem. Can you characterize the nature of the number crunching? (E.g., is the crunching coordinated between servers? At what level of granularity?) The nature of storage? (E.g., are stored computations shared between servers, or is storage write-only?)

      The number crunching is coordinated by a central node.
      It is just handing out job ids, the number crunchers are then fetching the data from the right storage server (storage servers hold ranges of jobids), crunching the data and sending it to another storage server where the data is saved for later analysis.
      Storage servers are MySQL and always readwrite.

        So the number-crunchers are fetching/storing data (using SOAP) from storage servers, which are fetching/storing the data from MySQL servers - or am I misunderstanding?

        Why do you need the intermediaries? Why not have the number-crunchers fetch/store directly to the MySQL servers?

        (apologies if I'm being dim - rather late here.)

Re: Clustered Perl Applications?
by IlyaM (Parson) on Jul 05, 2003 at 12:24 UTC
    Other alternatives: Lot of interesting links on web services can be found on this page.

    Currently I'm working on the project which involves distributed web services and after carefully examining SOAP and REST we have decided to go REST way. One of motivations is that SOAP is extremly blotated standard which adds a lots of complexity which in most cases doesn't solve problems which cannot be solved REST way. Do not be fooled by simplicity of SOAP::Lite API - read specs to get idea how complex is it.

    --
    Ilya Martynov, ilya@iponweb.net
    CTO IPonWEB (UK) Ltd
    Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
    Personal website - http://martynov.org

      I know what a monster SOAP is.
      But I have to send big amounts of structured data in both directions.
      And as far as I know, this is not a strength of REST.
        Hmm, this is an ideal application for REST - your data can be converted to just XML. This is parsed much faster than SOAP (despite that also being XML) because it's 1 less layer to go through.
Re: Clustered Perl Applications?
by naChoZ (Curate) on Jul 05, 2003 at 03:44 UTC
    You may take a click over to http://www.distributed.net and check out their way of doing things. I participated in the rc5-64 challenge for a few years. IIRC, I'm pretty sure the details of their operation is available for reading somewhere on the site.

    ~~
    naChoZ

      But the problems distributed.net are working on are very specialized problems - they are the problems were nodes need no communication with each other, except for exchanging some information with a central node, but even that is not very much. Communication between nodes is certainly not a limiting factor for their kind of problems, but it appears to be the case for the OP.

      Abigail

Re: Clustered Perl Applications?
by jepri (Parson) on Jul 05, 2003 at 11:47 UTC
    There is a great protocol called BEEP (it's an RFC), which would do a lot of what you want. Unfortunately it is a C lib and hasn't been wrapped for perl yet, although I've been thinking of doing it myself.

    I wrote an IPC module that works through network sockets and could easily be adapted for full network use. It transfers data stuctures using Storable, and they appear at the receiver without any work. It's on my perlmonk page, or message me for more details.

    In any case, you will be using Storable, since it is an amazingly good module that serialises perl data structures.

    I usually advise against chucking everything into a database. There are lots of great reasons for using a database, but it should be a considered decision, rather than the first thing you reach for. They have some great features like automatically working with multiple clients while insuring data integrity, but they are also slow compared to a solution that takes into account how different nodes are going to use the data.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

        Yeah, they all make a big thing about how it works, but then they are all nerds. BEEP is just yet another way to shove data from one app to another. It has nothing to do with SOAP, except that the promiscuous SOAP team appear to got into bed with yet another transport protocol.

        BEEP doesn't care what data you send. If that data is SOAP data, great. But feel free to send your own data, using Storable or pack(), or XML. It's all the same in the end.

        ____________________
        Jeremy
        I didn't believe in evil until I dated it.

Re: Clustered Perl Applications?
by sgifford (Prior) on Jul 05, 2003 at 04:27 UTC
    What is slow about what you're doing now? Is it packing/unpacking the data, or transferring the data across the network, or loading the SOAP modules, or...?

    Doing a little profiling to figure out exactly what you're trying to improve could save you a ton of time.

      As a side-note, I would add that the SOAP::Lite implementation of the SOAP interface is very slow - There was recently a journal entry on http://use.perl.org that linked through to a very comprehensive comparison of SOAP interfaces between different languages and platforms. Despite its ease in use, SOAP::Lite fared very poorly in these tests with great latency in response and throughput time.

      I just wish I could find the link or journal entry so that I could link to it from here ... :-(

      Update (2004-03-11) - Found the link - http://www.caip.rutgers.edu/TASSL/Papers/p2p-p2pws02-soap.pdf

       

      perl -le 'print+unpack"N",pack"B32","00000000000000000000001001101111"'

        I had seen that journal entry too, and I have to agree.

        The bottleneck seems to be the packing/unpacking part, according to my little profiling. With both XML::Parser and XML::Parser::Lite.
Re: Clustered Perl Applications?
by cleverett (Friar) on Jul 05, 2003 at 04:27 UTC
    Stem or POE?
Re: Clustered Perl Applications?
by tmiklas (Hermit) on Jul 05, 2003 at 20:33 UTC
    I would suggest POE. I've used it once for implementing distributed data analysys for an irc bot - everything written some time ago as a proof of concept and it worked (woah!) until i've lost it during system crash (so where's my backup?) :-)
    Back to POE - especially the POE Cookbook is worth reading...

    Greetz, Tom.
      Until now I've not much heared about POE, what's the big deal with it?

      I took a look at the cookbook and the documentation but to me it just looks like a very bloated perl application server framework thingy, please correct me if i am wrong.

      Are there any reference projects using it out there?

      I personally like to use apache/mod_perl for such things.

        Well - I wouldn't describe it as "very bloated". It's a very flexible system that allows you to throw together complex servers quickly. There are many organisations and projects using it successfully.

        So consider yourself corrected :-)

        Apache/mod_perl is nice - its what I use myself most of the time - but not all applications fall easily into a stateless HTTP request/response framework.

Re: Clustered Perl Applications?
by scrubroot (Novice) on Jul 08, 2003 at 14:57 UTC
    If you are not tied to the idea of having Perl handling the actual inter process communication, there may another option that might prove useful: openMosix. Here's some text from the web site that best describes what it does:

    "…openMosix is a Linux kernel extension for single-system image clustering. This kernel extension turns a network of ordinary IA-32 computers into a supercomputer for Linux applications…."

    "…There is no need to program applications specifically for openMosix. Since all openMosix extensions are inside the kernel, every Linux application automatically and transparently benefits from the distributed computing concept of openMosix. The cluster behaves much as does a Symmetric Multi-Processor, but this solution scales to well over a thousand nodes which can themselves be SMPs."

    In theory, all you need to do is write your Perl code so that it forks off a process to do the number crunching and then return the data when it finishes, the kernel patch will take care of locating and moving the thread to an available machine for you. Processing nodes can be added or removed from a cluster without requiring effecting running processes (other than speed). The only down side to this is that this only works with Intel compatible processors and Linux, so this can't be run on a Sun box. However, it is FREE, and may be able to improve the performance of your existing Perl code with little or no modification.

    More info can be found on the web site here: http://openmosix.sourceforge.net/

    Rick