andrew732 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks,

I'm trying to write a simple Perl script that will run on N hosts in a decentralized distributed system. The script will make a text file "foo.txt" and then query every other host in the system to see if the contents of "foo.txt" agree. Naturally, this has turned out to be much, much harder than it sounds.

Does anyone know how to do this with minimal effort on my part? I would very much prefer not to deal with database transactions, distributed file locking issues, proving that race conditions can't occur, and all the other nasty things that might be involved in making this work. The ideal would be some sort of cpan module that does 95% of the work, but I would be OK with using non-Perl code or even another freestanding program that I could magically call with system(). I can't seem to find a good starting point though. Any ideas? Thanks!

Replies are listed 'Best First'.
Re: Distributed agreement woes
by Corion (Patriarch) on Jul 02, 2010 at 22:10 UTC

    You don't seem to have given this much thought. Otherwise, you'd specify who has the master copy, and how arbitration of differences would be handled.

    If you only want to find whether a copy is the same across all nodes, just distribute the file to all nodes and have them send notifications back. Quite easy.

      Not only have I given it a lot of thought, I think I described exactly what I want to do about as concisely as humanly possible. Notice that I said every host must independently construct the file i.e., there is no master. Also notice that I don't care about how differences are handled, I just want each host to know whether it is different from any other host.
        But being concise isn't necessarily enough -- you should aim for concise and descriptive, like your last sentence: "I just want each host to know whether it is different from any other host". Examples can also help.

        Also, keep in mind that many monks are used to seeing short posts asking for help with no example code where the OP clearly did not put a lot of thought into it. See e.g. How to convert a shell script to Perl? (note that originally there were no code tags). If you are too concise then your post may look like one of those, especially when just skimming it over it!! Not what you want, obviosuly!!

        In the future, aim for a better balance between being concise and explaining some details. And many monks like to have an idea of what problem you are actually trying to solve (e.g. why you want your servers to detect if they are different).

        Elda Taluta; Sarks Sark; Ark Arks

Re: Distributed agreement woes
by ww (Archbishop) on Jul 03, 2010 at 01:09 UTC
    Concise and descriptive... good!
    ... precise and complete: also required, or, at least, a very good idea!

    The original post -- read very carefully -- arguably satisfies all of those or comes close (but the fact that you don't care what any differences are comes out only later). Nonetheless, a couple extra words would make the spec a lot clearer: for example,

    s/The script will make a text file "foo.txt" and then query every other host in the system to see if the contents of "foo.txt" agree./A copy of the script on each host will make a text file...if the content of that host's "foo.txt" agrees with the local copy./

    (And, BTW, what are your criteria for "agree?" If some of your hosts are *n*x and some 'doze, are variant line endings allowed? What about the timestamps?)

    Further, as some nodes in this thread make clear: those who would help would like to know what the triggering event is; whether you have a plan for having each of N hosts check with each of the others, and perhaps even 'how you intend to use a report of differences' since that might restrict (or broaden) your options.

Re: Distributed agreement woes
by ikegami (Patriarch) on Jul 02, 2010 at 22:23 UTC

    I don't see the problem. You must already have some means of identifying the members of the network and some means of communicating with them. Include the file or a hash of the file in the request sent to the hosts, and have them reply whether their file is the same or not. If all the hosts in the network reply affirmatively, then the file is/was the same across all hosts.

      You're assuming there is a master. This is a decentralized system made up of peers and the peers must reach an agreement among themselves, a much trickier problem than it seems at first.
        No, I didn't. Your requirement states some script wants to know if all machines are in sync. Therefore, you have a requester. That's all my method requires.
Re: Distributed agreement woes
by ruzam (Curate) on Jul 03, 2010 at 02:43 UTC

    decentralized distributed system sounds a lot like a clustered file system to me. Probably because I've never used clustered file systems, so I probably don't understand all the gritty details. But if you had a shared clustered (decentralized) folder that all hosts can see and write to, then each host could generate a unique "foo.host.txt" file in said shared folder (without need for worrying about locks and race conditions) and then each host could scan the directory for every other foo.host.txt file (file name pattern match) and apply what ever 'agree' logic you desire. You either trust the 'other' host file or ignore it, maybe with a verified start and end key to let you know you've read the whole file top to bottom without it (or any underlying buffers) changing on you. The file system should take care of the rest of any buffering/locking issues.

    Just thinking out loud. Depending on your definition of 'agree', reading host files may be the least of your worries.