Re: [OT] Merging messages between users on different machines
by BrowserUk (Patriarch) on Jun 21, 2015 at 16:40 UTC
|
Seems to me, without having done any kind of exhaustive analysis, that in any system of multiple sources to multiple destinations; there is only one universal frame of reference: time. (Ignoring relativity :)
Any counters you employed; would need to be synchronised; and thus leave windows of opportunity for clashes and mismatches.
You say "But using only timestamps will sooner or later result in a collision."; but unless Alice (or Bob) are in the habit of typing messages on two different devices simultaneously, there is no scope for timestamp collisions originating from one individual.
Unless that one account could be legitimately used by a group of persons; in which case, each device should add it's device number to the timestamp to ensure uniqueness.
Of course, neither timestamps alone, nor timestamps + deviceId, give any mechanism for recipients to detect they have missed a communication.
I only see three possible approaches to that
- Guaranteed delivery; a la tcp .v. ucp:
If sent messages required positive acknowledgements; and acknowledgements also require positive acknowledgements (look up "triple-handshake"), messages would be resent by their originator until it received an ack; and recipients would discard or disregard receipts until they received acks to their acks.
And yes. It's recursive, so needs a bailout and start again clause.
- Swap (or augment) the deviceId with a device counter:
If each sender (device) maintains a monotonically increasing count of outbound comms with every recipient device; then when a recipient receives a message from a given sender, it can check to see if it missed anything since the last message it received from that sender.
Of course, if the sender sends a message that fails to deliver; and never sends another; the recipient will never know it missed something.
- Master senders and/or master receivers:
If each group of devices chose and maintained an official server device. and all outbound and inbound comms were redirected through that device, it could maintain sender/receiver pair counts that could serve to uniquely identify communications; but the logistics are horrible.
It would require regular handshakes between the current primary and all the other devices; and there would need to be an automated fail-over mechanism; and an automated fail-over-fail handover mechanism....
In fact. I'm regretting ever mentioning a "third option" :)
All in all, treating every device to device communication as a separate id (deviceId+deviceId+count) seems to be the only sure way of detecting missed communications in a timely and guaranteed manner.
And the process of sharing and consolidating communications between groups of devices allocated to any given userid is a secondary process that happens on a "best endeavour" basis.
This allows for the user of a group of devices to eventually be aware of all communications and any missed ones; but it leaves a time window during which they may be unaware of the true state of the conversation.
That time window of incomplete information is potentially open-ended; but that will always be the case when any device could fail permanently at any given time.
Not much in the way of a useful answer there; but maybe food for thought.
| [reply] |
|
|
First of all, thank you very much. It took quite some time to digest your advice :)
A "message - ACK - ACK-to-ACK" model represents a dialog very well. Combining sending messages with synchronisation attempts (Hi Bob, my version of dialog between us consists of 1239 messages, there's a new one attached: "..." - Hi Alice, now I have 1239 messages too, thanks!) takes care of the case when Bob had had one device with the message from Alice and the device is lost: they'll sync it next time any of them sends a message to the other.
To speed up the synchronisation process in common cases the ability to "suggest" messages may be useful. For example, when node Ak has just received a new message from Ai and it wants to forward it to another node Al, Ak can't be sure that Al doesn't have this message yet (maybe Al had been contacted by Ai, too); so Ak launches the synchronisation algorithm and suggests (sends the messageID) that the missing message (if any) may be the new one; if Al doesn't have a message with this ID, it requests the message and solves the problem, at least partially; if not, that's ~20 bytes wasted. I even thought that the node which has the most messages should always suggest last (nmax-nmin) messages known to it, but that sounds a bit too optimistic (I'll do some benchmarks though). "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.
So the protocol now looks like that:
- Alice wants to send Bob a message from her Ai and Bob's Bj is online.
- Ai → Bj: send the new count of the messages and the new message's messageID and body; retransmit until reply received.
- If message counts still differ after processing the new message, run the binary search sync, retransmitting each, well, cue until reply is received.
- When a node receives a message count equal to its own one, it sends a reply "sync done, thanks" (like FIN), which is retransmitted until confirmation reply (like FIN ACK) is received.
- "Sync done confirmed" ("FIN ACK") is sent each time a "sync done, thanks" is received. No reply (stop clause) and therefore no retransmission is required.
- Ai → other Ak and Bj → other Bl: similar, but
- The forwarded message contains a list of nodes which already have the message; nodes add themselves to it before forwarding it further and do not contact the nodes from the list while doing so.
- Ai and Bj may attach the message immideately, being the first to have it; others need to "suggest" it first.
Do these additions sound sane?
| [reply] |
|
|
Starting at the end:
Do these additions sound sane?
Um. After my 3rd reading, nothing you've described leaps off the page as unimplementable; but ... having designed and implemented a couple of comms protocols in the past, neither would I expect to detect any flaws from reading the description.
Such protocols are inherently asynchronous, and as such, it is extremely hard to envisage them from wordy descriptions. There are only 3 ways I know of to adequately design/envisage/test a protocol of this type:
- Comprehensive state diagrams.
These are diagrams showing, in exhaustive detail, the time flow of messages and acknowledgements between commumicating pairs.
You start with the simple, everything works case: Alice Bob
[new msg] ----------------->
<----------------- [ack]
[ack] ----------------->
Then you introduce fail (non)events: [new msg] ----------------->
...
[new msg] ----------------->
...
[new msg] ----------------->
<----------------- [ack]
[ack] ----------------->
Then you introduce fail (non)events: [new msg] ----------------->
<----------------- [ack]
............
<----------------- [ack]
............
<----------------- [ack]
[ack] ----------------->
Then more fail (non)events: [new msg] ----------------->
<----------------- [ack]
............
<----------------- [ack]
............
<----------------- [ack]
............
[new msg] ----------------->
<----------------- [ack]
[ack] ----------------->
- Not really an alternative to the above (which I think are an essential starting point), but complementary to them, is a simulation.
Effectively, a slowed-down, high level implementation -- I'd use threads and queues; you could also use processes and sockets, though I think that is harder -- of the real thing; that runs in-box, preferably with a gui front end that allows you to watch the flows and manually break connections, add and remove comms partners, etc. and see what happens.
The first time I did this, we used a simple console-type screen and ansi escape sequences for the "gui". Three "systems" were the left, top, and right edges of the screen; and we had control "buttons" on the bottom edge.
Some of the buttons allowed us to single step the "time", so that we could watch the protocol in flow; and others allowed us to block or loose a message or ack, to see what happened. Etc.
- Mathematical modelling. (Warning: I didn't understand and had(have) no real faith in this.)
The second comms protocol I worked on was effectively the same as things like MSMQ and AMQP; but before they existed. It ran under OS/2, and was designed to allow disparate systems (mainframes, minis, PCs and specialist hardware) within a large Insurance Company, full interprocess communications. I was bought in for my OS/2 expertise (subcontracted, via a bespoke software house, via IBM Global).
They, the insurance company, had an employee -- an apparently brilliant mathematician -- who's brief took him all over the company to drop in on projects of all types and see what of any insights he could provide. When he dropped in on our project, we were asked to take him through our state/flow diagrams -- they were drawn in a slide show/presentation software package the name of which I forget -- and there were about 150 of them.
He sat through a 2 hour, highly compressed, slide-show presentation, taking notes, asking a few pointed question and left without giving any feedback. Two weeks later he came back carrying three of those AO-sized easel pads, covered in equations. The notations were unlike anything I'd ever seen before or since.
He "showed" us how these related to the state/flow diagrams we'd presented. He then showed us 3 separate places where our diagrams did not cover all the possibilities. We (none of us) understood his nomenclature; nor could we relate it back to out diagrams such that we could find the flaws he identified. But, the customer had great faith in the guy, so we had to show that we had both listened and addressed his concerns.
Since we couldn't understand his stuff; and he couldn't communicate the problems in terms we understood, we offered empirical proof. We set up a dedicated machine running OS/2 and a dozen agent processes that ran continuously, with full logging, initiating random conversations between themselves. Occasionally, (randomly) they would either drop, abort or just abandon conversation. And randomly, a separate process would randomly either kill a running agent; or start a new one.
That ran for a day and a half before it logged a problem. The mathematician claimed it was one of the three he had warned of. He was vindicated.
It ran on, non-stop (accept for copying the log files off to free disc space) for another month without either reproducing that error; or producing either of the other two.
My part of the project was done and I moved on. I never heard what was done about the other two predicted flaws.
The point of the story is: multi-partner comms protocols are hard. Bugs can lurk deep in them for a very long time before they manifest themselves; and when they do, they can be nigh impossible to reproduce or track down.
"Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ.
This sounds dangerous to me. The only 'secret' for a good comms protocol I can offer, is keep it simple.
The above sounds very like a case of: "Oh. This might be nice.". And that is (IMO) the start of the rot.
The very moment you start adding things; without having a very clear, real scenario that must be addressed you have started on the slippery slope of either featuritise or futuritise; and that is the very opposite of KISS. As soon as you start down that road; you are on a journey to failure.(IMO :)
Perhaps the best advice I could offer at this point is: Have you really looked at all the existing mechanisms and alternatives?
And the second best is: Keep your communications protocol (the triple handshake stuff) entirely separate from the inbox synchronisation stuff (the binary chop).
The third would have to be: if you go the route of writing this from scratch; mock up your simulation early, and run it hard. And make sure that you can quickly relate the logs back to your state/flow diagrams.
And if that all sounds like I am trying to put you off from doing this all yourself; I kind of am. Because reading your description above suggests that you are conflating the triple-handshake mechanism -- who's purpose as I described it was just for guaranteed delivery -- with the synchronisation mechanism -- which I suggested should be a secondary process.
A major purpose of the guaranteed delivery is that communicating ids should never be in the situation where they have differing opinions of their communications history. Either the message was sent and received, and both ids know that. Or, the delivery failed; and both ids know that.
The synchronisation process only happens between devices owned by the same ID; not between IDs.
I wish I could do a better job of conveying my meaning here; but I hope that this will be of some use to you. However, what you are asking for here is well beyond the scope of the type of help you should expect to get from a forum such as this.
-----------------
| [reply] [d/l] [select] |
|
|
|
|
Re: [OT] Merging messages between users on different machines
by bitingduck (Deacon) on Jun 21, 2015 at 18:57 UTC
|
Each message could also have stored with it a digest like MD5(Timestamp+message). Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered.
| [reply] |
|
|
| [reply] |
|
|
MD5(Timestamp+message)
I strongly suggest that you keep the timestamps outside of (separate from) whichever digest function you use. Digests are inherently unorderable with respect to time.
The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages?
In the following assume: messageNo is a monotonically increase number from oldest to newest; messageId is timestamp+digest[+deviceId+deviceId].
- Swap counts; same and you're done.
- Otherwise; swap messageId of the message at messageNo (smaller count/2) from the beginning.
- Same; add (smaller count/4) to previous messageNo (smaller count/2) and exchange messageIds. goto step 3.
- Different; subtract (smaller count/4) from previous messageNo and exchange messageIds. goto step 3.
Basically; do a binary search (oldest->newest) to locate the first difference; and then move forward from that point; exchanging first messageIds; then full messages if needed.
| [reply] |
|
|
Re: [OT] Merging messages between users on different machines
by Anonymous Monk on Jun 21, 2015 at 17:53 UTC
|
Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository.
However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents?
| [reply] |
|
|
Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository.
This is my point: the idea is already implemented many times, but all the applications are too different from the field I want to work in. For example, kludging ssh and rsync to do instant messaging would solve the problem, but it's way too painful to deploy and maintain.
However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents?
Not on a local network, but through SSL, and node users will have to explicitly trust each other's public keys before the connection is possible, so the agents should be considered well-behaved. My major is far from cryptography, so I try to go for well-known cryptographic primitives instead of trying to roll my own.
| [reply] |