Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

My question is not directly related to Perl and I currently don't have any code to show, but people here usually have helped me with algorithms a lot in the past and I plan on implementing this in Perl.

Let's suppose that there are two users, Alice and Bob, each of them has a set of machines/addresses A1..An and B1..Bm. It's not guaranteed that any pair of machines can talk to each other at any given moment of time, because Alice and Bob take their machines offline while they do not use them, but machines which are online can interconnect freely.

Alice and Bob want to exchange messages and want to access all message history from any of their devices. What algorithm should they use to achieve that?

My thinking is along the following lines:

Of course, it's not going to work.

  1. Alice sends Bob a picture of her cat trying to get inside a box. The moment the message is delivered a TLA agent tries to shoot Bob with a laser rifle. Bob escapes alive and unharmed, but his smartphone has paid its life for Bob and won't show him the picture of the cat. Now Alice has the message marked as delivered, so she won't retry sending it, but none of Bob's computers have it. Bob will never know whether cat achieved its goal or not. TLA has won.
  2. Machine B1 is Bob's home PC, B2 is his work PC and B3 is his smartphone (he bought another one). Ai sends him a message while he's at home, so she connects to B1, and B1 forwards the message to B3. While B3 delivers the message to B2 afterwards, both B1 and B2 will keep the message as "still needs to be delivered" forever because they can't talk to each other.

To get around (1) Bob needs a way to ask Alice to restart the delivery of messages of which he does not know. Alice: "But what messages don't you know of?" Bob: "I don't know!". Should Bob periodically ask Alice whether she sent him messages later than <timestamp of the last message from Alice>? Should Alice ask new Bob's smartphone whether he has the message with <timestamp of the message Alice sent to Bob's old smartphone>?

To get around (2) (both m*(m-1) connections to forward a message instead of (m-1) and storing "undelivered" messages forever) a forwarded message should contain information about which computers have already received this message. But this would solve the problem only partway: while B2 will know that it shouldn't forward the message any further because B3 and B1 already have it, B1 won't know that and will keep trying to connect to B3. Another example: Claire has 4 computers which interconnect 100% of the time, so delivering a message to C1 causes 3 useful connections from C1 to C2..4 and another 6 redundant and useless ones between C2..4. At this point I'm close to reimplementing TCP ACK and TTL which means I'm going to over-engineer the protocol, and it will still break under some rare condition I can't think of right now. And this kind of redundancy, while causing (9*8=72 useless/9 useful) = 8 times more useless traffic than useful in case of 10 interconnecting machines, will still be required in case when these machines come online only sequentially and in pairs (like in a party game).

The question: what algorithms (even better, with well-tested implementations on CPAN) should Alice and Bob search for, if they want to merge their instant messages between their machines, some of which may not directly connect to each other? And thank you for your patience if you have stayed with me for the duration of the post.

  • Comment on [OT] Merging messages between users on different machines

Replies are listed 'Best First'.
Re: [OT] Merging messages between users on different machines
by BrowserUk (Patriarch) on Jun 21, 2015 at 16:40 UTC

    Seems to me, without having done any kind of exhaustive analysis, that in any system of multiple sources to multiple destinations; there is only one universal frame of reference: time. (Ignoring relativity :)

    Any counters you employed; would need to be synchronised; and thus leave windows of opportunity for clashes and mismatches.

    You say "But using only timestamps will sooner or later result in a collision."; but unless Alice (or Bob) are in the habit of typing messages on two different devices simultaneously, there is no scope for timestamp collisions originating from one individual.

    Unless that one account could be legitimately used by a group of persons; in which case, each device should add it's device number to the timestamp to ensure uniqueness.

    Of course, neither timestamps alone, nor timestamps + deviceId, give any mechanism for recipients to detect they have missed a communication.

    I only see three possible approaches to that

    1. Guaranteed delivery; a la tcp .v. ucp:

      If sent messages required positive acknowledgements; and acknowledgements also require positive acknowledgements (look up "triple-handshake"), messages would be resent by their originator until it received an ack; and recipients would discard or disregard receipts until they received acks to their acks.

      And yes. It's recursive, so needs a bailout and start again clause.

    2. Swap (or augment) the deviceId with a device counter:

      If each sender (device) maintains a monotonically increasing count of outbound comms with every recipient device; then when a recipient receives a message from a given sender, it can check to see if it missed anything since the last message it received from that sender.

      Of course, if the sender sends a message that fails to deliver; and never sends another; the recipient will never know it missed something.

    3. Master senders and/or master receivers:

      If each group of devices chose and maintained an official server device. and all outbound and inbound comms were redirected through that device, it could maintain sender/receiver pair counts that could serve to uniquely identify communications; but the logistics are horrible.

      It would require regular handshakes between the current primary and all the other devices; and there would need to be an automated fail-over mechanism; and an automated fail-over-fail handover mechanism....

      In fact. I'm regretting ever mentioning a "third option" :)

    All in all, treating every device to device communication as a separate id (deviceId+deviceId+count) seems to be the only sure way of detecting missed communications in a timely and guaranteed manner.

    And the process of sharing and consolidating communications between groups of devices allocated to any given userid is a secondary process that happens on a "best endeavour" basis.

    This allows for the user of a group of devices to eventually be aware of all communications and any missed ones; but it leaves a time window during which they may be unaware of the true state of the conversation.

    That time window of incomplete information is potentially open-ended; but that will always be the case when any device could fail permanently at any given time.

    Not much in the way of a useful answer there; but maybe food for thought.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

      First of all, thank you very much. It took quite some time to digest your advice :)

      A "message - ACK - ACK-to-ACK" model represents a dialog very well. Combining sending messages with synchronisation attempts (Hi Bob, my version of dialog between us consists of 1239 messages, there's a new one attached: "..." - Hi Alice, now I have 1239 messages too, thanks!) takes care of the case when Bob had had one device with the message from Alice and the device is lost: they'll sync it next time any of them sends a message to the other.

      To speed up the synchronisation process in common cases the ability to "suggest" messages may be useful. For example, when node Ak has just received a new message from Ai and it wants to forward it to another node Al, Ak can't be sure that Al doesn't have this message yet (maybe Al had been contacted by Ai, too); so Ak launches the synchronisation algorithm and suggests (sends the messageID) that the missing message (if any) may be the new one; if Al doesn't have a message with this ID, it requests the message and solves the problem, at least partially; if not, that's ~20 bytes wasted. I even thought that the node which has the most messages should always suggest last (nmax-nmin) messages known to it, but that sounds a bit too optimistic (I'll do some benchmarks though). "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.

      So the protocol now looks like that:

      1. Alice wants to send Bob a message from her Ai and Bob's Bj is online.
      2. Ai → Bj: send the new count of the messages and the new message's messageID and body; retransmit until reply received.
        1. If message counts still differ after processing the new message, run the binary search sync, retransmitting each, well, cue until reply is received.
        2. When a node receives a message count equal to its own one, it sends a reply "sync done, thanks" (like FIN), which is retransmitted until confirmation reply (like FIN ACK) is received.
        3. "Sync done confirmed" ("FIN ACK") is sent each time a "sync done, thanks" is received. No reply (stop clause) and therefore no retransmission is required.
      3. Ai → other Ak and Bj → other Bl: similar, but
        1. The forwarded message contains a list of nodes which already have the message; nodes add themselves to it before forwarding it further and do not contact the nodes from the list while doing so.
        2. Ai and Bj may attach the message immideately, being the first to have it; others need to "suggest" it first.

      Do these additions sound sane?

        Starting at the end:

        Do these additions sound sane?

        Um. After my 3rd reading, nothing you've described leaps off the page as unimplementable; but ... having designed and implemented a couple of comms protocols in the past, neither would I expect to detect any flaws from reading the description.

        Such protocols are inherently asynchronous, and as such, it is extremely hard to envisage them from wordy descriptions. There are only 3 ways I know of to adequately design/envisage/test a protocol of this type:

        • Comprehensive state diagrams.

          These are diagrams showing, in exhaustive detail, the time flow of messages and acknowledgements between commumicating pairs.

          You start with the simple, everything works case:

          Alice Bob [new msg] -----------------> <----------------- [ack] [ack] ----------------->

          Then you introduce fail (non)events:

          [new msg] -----------------> ... [new msg] -----------------> ... [new msg] -----------------> <----------------- [ack] [ack] ----------------->

          Then you introduce fail (non)events:

          [new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] [ack] ----------------->

          Then more fail (non)events:

          [new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] ............ [new msg] -----------------> <----------------- [ack] [ack] ----------------->
        • Not really an alternative to the above (which I think are an essential starting point), but complementary to them, is a simulation.

          Effectively, a slowed-down, high level implementation -- I'd use threads and queues; you could also use processes and sockets, though I think that is harder -- of the real thing; that runs in-box, preferably with a gui front end that allows you to watch the flows and manually break connections, add and remove comms partners, etc. and see what happens.

          The first time I did this, we used a simple console-type screen and ansi escape sequences for the "gui". Three "systems" were the left, top, and right edges of the screen; and we had control "buttons" on the bottom edge.

          Some of the buttons allowed us to single step the "time", so that we could watch the protocol in flow; and others allowed us to block or loose a message or ack, to see what happened. Etc.

        • Mathematical modelling. (Warning: I didn't understand and had(have) no real faith in this.)

          The second comms protocol I worked on was effectively the same as things like MSMQ and AMQP; but before they existed. It ran under OS/2, and was designed to allow disparate systems (mainframes, minis, PCs and specialist hardware) within a large Insurance Company, full interprocess communications. I was bought in for my OS/2 expertise (subcontracted, via a bespoke software house, via IBM Global).

          They, the insurance company, had an employee -- an apparently brilliant mathematician -- who's brief took him all over the company to drop in on projects of all types and see what of any insights he could provide. When he dropped in on our project, we were asked to take him through our state/flow diagrams -- they were drawn in a slide show/presentation software package the name of which I forget -- and there were about 150 of them.

          He sat through a 2 hour, highly compressed, slide-show presentation, taking notes, asking a few pointed question and left without giving any feedback. Two weeks later he came back carrying three of those AO-sized easel pads, covered in equations. The notations were unlike anything I'd ever seen before or since.

          He "showed" us how these related to the state/flow diagrams we'd presented. He then showed us 3 separate places where our diagrams did not cover all the possibilities. We (none of us) understood his nomenclature; nor could we relate it back to out diagrams such that we could find the flaws he identified. But, the customer had great faith in the guy, so we had to show that we had both listened and addressed his concerns.

          Since we couldn't understand his stuff; and he couldn't communicate the problems in terms we understood, we offered empirical proof. We set up a dedicated machine running OS/2 and a dozen agent processes that ran continuously, with full logging, initiating random conversations between themselves. Occasionally, (randomly) they would either drop, abort or just abandon conversation. And randomly, a separate process would randomly either kill a running agent; or start a new one.

          That ran for a day and a half before it logged a problem. The mathematician claimed it was one of the three he had warned of. He was vindicated.

          It ran on, non-stop (accept for copying the log files off to free disc space) for another month without either reproducing that error; or producing either of the other two.

          My part of the project was done and I moved on. I never heard what was done about the other two predicted flaws.

          The point of the story is: multi-partner comms protocols are hard. Bugs can lurk deep in them for a very long time before they manifest themselves; and when they do, they can be nigh impossible to reproduce or track down.

        "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.

        This sounds dangerous to me. The only 'secret' for a good comms protocol I can offer, is keep it simple.

        The above sounds very like a case of: "Oh. This might be nice.". And that is (IMO) the start of the rot.

        The very moment you start adding things; without having a very clear, real scenario that must be addressed you have started on the slippery slope of either featuritise or futuritise; and that is the very opposite of KISS. As soon as you start down that road; you are on a journey to failure.(IMO :)

        Perhaps the best advice I could offer at this point is: Have you really looked at all the existing mechanisms and alternatives?

        And the second best is: Keep your communications protocol (the triple handshake stuff) entirely separate from the inbox synchronisation stuff (the binary chop).

        The third would have to be: if you go the route of writing this from scratch; mock up your simulation early, and run it hard. And make sure that you can quickly relate the logs back to your state/flow diagrams.

        And if that all sounds like I am trying to put you off from doing this all yourself; I kind of am. Because reading your description above suggests that you are conflating the triple-handshake mechanism -- who's purpose as I described it was just for guaranteed delivery -- with the synchronisation mechanism -- which I suggested should be a secondary process.

        A major purpose of the guaranteed delivery is that communicating ids should never be in the situation where they have differing opinions of their communications history. Either the message was sent and received, and both ids know that. Or, the delivery failed; and both ids know that.

        The synchronisation process only happens between devices owned by the same ID; not between IDs.

        I wish I could do a better job of conveying my meaning here; but I hope that this will be of some use to you. However, what you are asking for here is well beyond the scope of the type of help you should expect to get from a forum such as this.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
        -----------------
Re: [OT] Merging messages between users on different machines
by bitingduck (Deacon) on Jun 21, 2015 at 18:57 UTC
    Each message could also have stored with it a digest like MD5(Timestamp+message). Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered.

      MD5(Timestamp+message)
      That's an idea, thanks. If both timestamp and contents of two messages coincide, then the messages are, indeed, same. (Though I think that SHA1 is better in terms of collisions, and I won't employ SHA2 there because cryptographic integrity will be protected by SSL.)

      Also, I still need a way to distinguish the messages as they are delivered to different users (I need some time to prepare an answer for you, BrowserUK).

      Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered.
      So, part of the question is comparing two sorted arrays of 20-byte numbers. Thank you, now I understand better how tools like rsync work. The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages? (for me, the worst case would be (1e5 * 20) = 2M of data per sync attempt, which is okay if performed rarely, but quite a lot - and may require resuming - on a EGDE connection)

        MD5(Timestamp+message)

        I strongly suggest that you keep the timestamps outside of (separate from) whichever digest function you use. Digests are inherently unorderable with respect to time.

        The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages?

        In the following assume: messageNo is a monotonically increase number from oldest to newest; messageId is timestamp+digest[+deviceId+deviceId].

        1. Swap counts; same and you're done.
        2. Otherwise; swap messageId of the message at messageNo (smaller count/2) from the beginning.
        3. Same; add (smaller count/4) to previous messageNo (smaller count/2) and exchange messageIds. goto step 3.
        4. Different; subtract (smaller count/4) from previous messageNo and exchange messageIds. goto step 3.

        Basically; do a binary search (oldest->newest) to locate the first difference; and then move forward from that point; exchanging first messageIds; then full messages if needed.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
Re: [OT] Merging messages between users on different machines
by Anonymous Monk on Jun 21, 2015 at 17:53 UTC

    Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository.

    However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents?

      Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository.
      This is my point: the idea is already implemented many times, but all the applications are too different from the field I want to work in. For example, kludging ssh and rsync to do instant messaging would solve the problem, but it's way too painful to deploy and maintain.

      However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents?
      Not on a local network, but through SSL, and node users will have to explicitly trust each other's public keys before the connection is possible, so the agents should be considered well-behaved. My major is far from cryptography, so I try to go for well-known cryptographic primitives instead of trying to roll my own.