in reply to Re: [OT] Merging messages between users on different machines
in thread [OT] Merging messages between users on different machines

First of all, thank you very much. It took quite some time to digest your advice :)

A "message - ACK - ACK-to-ACK" model represents a dialog very well. Combining sending messages with synchronisation attempts (Hi Bob, my version of dialog between us consists of 1239 messages, there's a new one attached: "..." - Hi Alice, now I have 1239 messages too, thanks!) takes care of the case when Bob had had one device with the message from Alice and the device is lost: they'll sync it next time any of them sends a message to the other.

To speed up the synchronisation process in common cases the ability to "suggest" messages may be useful. For example, when node Ak has just received a new message from Ai and it wants to forward it to another node Al, Ak can't be sure that Al doesn't have this message yet (maybe Al had been contacted by Ai, too); so Ak launches the synchronisation algorithm and suggests (sends the messageID) that the missing message (if any) may be the new one; if Al doesn't have a message with this ID, it requests the message and solves the problem, at least partially; if not, that's ~20 bytes wasted. I even thought that the node which has the most messages should always suggest last (nmax-nmin) messages known to it, but that sounds a bit too optimistic (I'll do some benchmarks though). "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.

So the protocol now looks like that:

  1. Alice wants to send Bob a message from her Ai and Bob's Bj is online.
  2. Ai → Bj: send the new count of the messages and the new message's messageID and body; retransmit until reply received.
    1. If message counts still differ after processing the new message, run the binary search sync, retransmitting each, well, cue until reply is received.
    2. When a node receives a message count equal to its own one, it sends a reply "sync done, thanks" (like FIN), which is retransmitted until confirmation reply (like FIN ACK) is received.
    3. "Sync done confirmed" ("FIN ACK") is sent each time a "sync done, thanks" is received. No reply (stop clause) and therefore no retransmission is required.
  3. Ai → other Ak and Bj → other Bl: similar, but
    1. The forwarded message contains a list of nodes which already have the message; nodes add themselves to it before forwarding it further and do not contact the nodes from the list while doing so.
    2. Ai and Bj may attach the message immideately, being the first to have it; others need to "suggest" it first.

Do these additions sound sane?

  • Comment on Re^2: [OT] Merging messages between users on different machines

Replies are listed 'Best First'.
Re^3: [OT] Merging messages between users on different machines
by BrowserUk (Patriarch) on Jul 03, 2015 at 05:39 UTC

    Starting at the end:

    Do these additions sound sane?

    Um. After my 3rd reading, nothing you've described leaps off the page as unimplementable; but ... having designed and implemented a couple of comms protocols in the past, neither would I expect to detect any flaws from reading the description.

    Such protocols are inherently asynchronous, and as such, it is extremely hard to envisage them from wordy descriptions. There are only 3 ways I know of to adequately design/envisage/test a protocol of this type:

    • Comprehensive state diagrams.

      These are diagrams showing, in exhaustive detail, the time flow of messages and acknowledgements between commumicating pairs.

      You start with the simple, everything works case:

      Alice Bob [new msg] -----------------> <----------------- [ack] [ack] ----------------->

      Then you introduce fail (non)events:

      [new msg] -----------------> ... [new msg] -----------------> ... [new msg] -----------------> <----------------- [ack] [ack] ----------------->

      Then you introduce fail (non)events:

      [new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] [ack] ----------------->

      Then more fail (non)events:

      [new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] ............ [new msg] -----------------> <----------------- [ack] [ack] ----------------->
    • Not really an alternative to the above (which I think are an essential starting point), but complementary to them, is a simulation.

      Effectively, a slowed-down, high level implementation -- I'd use threads and queues; you could also use processes and sockets, though I think that is harder -- of the real thing; that runs in-box, preferably with a gui front end that allows you to watch the flows and manually break connections, add and remove comms partners, etc. and see what happens.

      The first time I did this, we used a simple console-type screen and ansi escape sequences for the "gui". Three "systems" were the left, top, and right edges of the screen; and we had control "buttons" on the bottom edge.

      Some of the buttons allowed us to single step the "time", so that we could watch the protocol in flow; and others allowed us to block or loose a message or ack, to see what happened. Etc.

    • Mathematical modelling. (Warning: I didn't understand and had(have) no real faith in this.)

      The second comms protocol I worked on was effectively the same as things like MSMQ and AMQP; but before they existed. It ran under OS/2, and was designed to allow disparate systems (mainframes, minis, PCs and specialist hardware) within a large Insurance Company, full interprocess communications. I was bought in for my OS/2 expertise (subcontracted, via a bespoke software house, via IBM Global).

      They, the insurance company, had an employee -- an apparently brilliant mathematician -- who's brief took him all over the company to drop in on projects of all types and see what of any insights he could provide. When he dropped in on our project, we were asked to take him through our state/flow diagrams -- they were drawn in a slide show/presentation software package the name of which I forget -- and there were about 150 of them.

      He sat through a 2 hour, highly compressed, slide-show presentation, taking notes, asking a few pointed question and left without giving any feedback. Two weeks later he came back carrying three of those AO-sized easel pads, covered in equations. The notations were unlike anything I'd ever seen before or since.

      He "showed" us how these related to the state/flow diagrams we'd presented. He then showed us 3 separate places where our diagrams did not cover all the possibilities. We (none of us) understood his nomenclature; nor could we relate it back to out diagrams such that we could find the flaws he identified. But, the customer had great faith in the guy, so we had to show that we had both listened and addressed his concerns.

      Since we couldn't understand his stuff; and he couldn't communicate the problems in terms we understood, we offered empirical proof. We set up a dedicated machine running OS/2 and a dozen agent processes that ran continuously, with full logging, initiating random conversations between themselves. Occasionally, (randomly) they would either drop, abort or just abandon conversation. And randomly, a separate process would randomly either kill a running agent; or start a new one.

      That ran for a day and a half before it logged a problem. The mathematician claimed it was one of the three he had warned of. He was vindicated.

      It ran on, non-stop (accept for copying the log files off to free disc space) for another month without either reproducing that error; or producing either of the other two.

      My part of the project was done and I moved on. I never heard what was done about the other two predicted flaws.

      The point of the story is: multi-partner comms protocols are hard. Bugs can lurk deep in them for a very long time before they manifest themselves; and when they do, they can be nigh impossible to reproduce or track down.

    "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.

    This sounds dangerous to me. The only 'secret' for a good comms protocol I can offer, is keep it simple.

    The above sounds very like a case of: "Oh. This might be nice.". And that is (IMO) the start of the rot.

    The very moment you start adding things; without having a very clear, real scenario that must be addressed you have started on the slippery slope of either featuritise or futuritise; and that is the very opposite of KISS. As soon as you start down that road; you are on a journey to failure.(IMO :)

    Perhaps the best advice I could offer at this point is: Have you really looked at all the existing mechanisms and alternatives?

    And the second best is: Keep your communications protocol (the triple handshake stuff) entirely separate from the inbox synchronisation stuff (the binary chop).

    The third would have to be: if you go the route of writing this from scratch; mock up your simulation early, and run it hard. And make sure that you can quickly relate the logs back to your state/flow diagrams.

    And if that all sounds like I am trying to put you off from doing this all yourself; I kind of am. Because reading your description above suggests that you are conflating the triple-handshake mechanism -- who's purpose as I described it was just for guaranteed delivery -- with the synchronisation mechanism -- which I suggested should be a secondary process.

    A major purpose of the guaranteed delivery is that communicating ids should never be in the situation where they have differing opinions of their communications history. Either the message was sent and received, and both ids know that. Or, the delivery failed; and both ids know that.

    The synchronisation process only happens between devices owned by the same ID; not between IDs.

    I wish I could do a better job of conveying my meaning here; but I hope that this will be of some use to you. However, what you are asking for here is well beyond the scope of the type of help you should expect to get from a forum such as this.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
    -----------------

      Well.

      I knew I was trying to jump over my head, so it's not entirely unexpected.

      without having a very clear, real scenario that must be addressed you have started on the slippery slope of either featuritise or futuritise; and that is the very opposite of KISS
      I've read so much about featureism, and now I fall prey to it myself. It must be easier when one is both the developer and one's own customer and, also, when one is too enthusiastic about the project. Having admitted that I must also say that it is possible for syncing between owner's devices to be not enough; i.e. I've been myself in a situation when one of my devices had received a message, then went offline, then I was using another device and had no other way to read the message but to ask the sender (who was online for the whole time) to send it again (except the protocol did not offer any way to sync messages, so I had lost from the beginning). This is why I suggested to implement between-account sync in addition to inter-account sync.

      Have you really looked at all the existing mechanisms and alternatives?
      Some people solve the problem by dropping local storage and leaving everything on the one and only central server. Reliable delivery is thus reduced to secure (HTTPS) and reliable (just hit F5 and reload 2M (not counting images) of javascript web application if something happens) R/W access to the message storage. Others do deliver messages via a central server to end-user devices, but don't even try to ensure that the history is consistent between clients (except for protocol extensions for delivery confirmation messages which get lost as easily as the messages themselves and server-side history storage). I've seen a program behave as if the synchronisation happens like I want it to, but I can't know for sure if it really does and what the program does with the clear text in the meantime because the protocol is closed and the binary is obfuscated (and I don't have the right to examine it anyway: I agreed to EULA). There are also programs which manage to be P2P by some means (over-the-internet networks/NAT hole punching/pure magic) but keep the computer/account ratio strictly equal to one and don't ensure delivery anyway. This also means that the problem I want to solve is a very hard one and I shouldn't be working on it because I'm not a professional programmer. But I will keep looking.

      And if that all sounds like I am trying to put you off from doing this all yourself; I kind of am.
      At this point I'm rather reluctant to continue too and I'd rather stop before I embarass myself even further. One last question though: I understand that it won't be enough to achieve my original goal, but can you suggest me a book which could enlighten me on network protocols and/or data storage formats? Most programming books I stumble upon tend to describe the language itself or algorithms in general. Clearly, there must be something I'm missing: either a classic covering these topics, or a fundamental principle (like "it's impossible to describe data storage in general without accompanying algorithms").

        can you suggest me a book which could enlighten me on network protocols and/or data storage formats?

        I don't have a book recommendation; but one of the ways of validating asynchronous protocols is a technique called Petri Net Modeling;. Besides the (rather overly academic) wikipedia page, a google search turns up a bunch of introductions, slide stacks and lecture notes and pdfs that might help you.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!