[OT] Merging messages between users on different machines

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

My question is not directly related to Perl and I currently don't have any code to show, but people here usually have helped me with algorithms a lot in the past and I plan on implementing this in Perl.

Let's suppose that there are two users, Alice and Bob, each of them has a set of machines/addresses A₁..A_n and B₁..B_m. It's not guaranteed that any pair of machines can talk to each other at any given moment of time, because Alice and Bob take their machines offline while they do not use them, but machines which are online can interconnect freely.

Alice and Bob want to exchange messages and want to access all message history from any of their devices. What algorithm should they use to achieve that?

My thinking is along the following lines:

Each message is identified by a timestamp, big enough so it won't suffer from 2038 problem (also, 1e-3 second precision should be enough for IM). But using only timestamps will sooner or later result in a collision. Perhaps (sender address + timestamp)? Another option would be (sender address + sequentially incrementing integer), and the integer shouldn't even be that large (I've checked: my IM history is around 1e5 messages in size after a few years, so while 16-bit is too small, 32-bit will last more than I probably will). Which is better and why?
Alice (from the machine A_i) wants to send a message to Bob, so she tries to connect to one of B_j known to her. Once a connection a successful and the message is delivered, she marks the message as sent and stops connecting to Bob. At the same time A_i queues the message to be delivered to other Alice's machines A₁..A_n.
Bob's machine behaves similarly: once he receives a message from A_i on his machine B_j, it forwards the message to other B₁..B_m, once they show online.

Of course, it's not going to work.

Alice sends Bob a picture of her cat trying to get inside a box. The moment the message is delivered a TLA agent tries to shoot Bob with a laser rifle. Bob escapes alive and unharmed, but his smartphone has paid its life for Bob and won't show him the picture of the cat. Now Alice has the message marked as delivered, so she won't retry sending it, but none of Bob's computers have it. Bob will never know whether cat achieved its goal or not. TLA has won.
Machine B₁ is Bob's home PC, B₂ is his work PC and B₃ is his smartphone (he bought another one). A_i sends him a message while he's at home, so she connects to B₁, and B₁ forwards the message to B₃. While B₃ delivers the message to B₂ afterwards, both B₁ and B₂ will keep the message as "still needs to be delivered" forever because they can't talk to each other.

To get around (1) Bob needs a way to ask Alice to restart the delivery of messages of which he does not know. Alice: "But what messages don't you know of?" Bob: "I don't know!". Should Bob periodically ask Alice whether she sent him messages later than <timestamp of the last message from Alice>? Should Alice ask new Bob's smartphone whether he has the message with <timestamp of the message Alice sent to Bob's old smartphone>?

To get around (2) (both m*(m-1) connections to forward a message instead of (m-1) and storing "undelivered" messages forever) a forwarded message should contain information about which computers have already received this message. But this would solve the problem only partway: while B₂ will know that it shouldn't forward the message any further because B₃ and B₁ already have it, B₁ won't know that and will keep trying to connect to B₃. Another example: Claire has 4 computers which interconnect 100% of the time, so delivering a message to C₁ causes 3 useful connections from C₁ to C_2..4 and another 6 redundant and useless ones between C_2..4. At this point I'm close to reimplementing TCP ACK and TTL which means I'm going to over-engineer the protocol, and it will still break under some rare condition I can't think of right now. And this kind of redundancy, while causing (9*8=72 useless/9 useful) = 8 times more useless traffic than useful in case of 10 interconnecting machines, will still be required in case when these machines come online only sequentially and in pairs (like in a party game).

The question: what algorithms (even better, with well-tested implementations on CPAN) should Alice and Bob search for, if they want to merge their instant messages between their machines, some of which may not directly connect to each other? And thank you for your patience if you have stayed with me for the duration of the post.

Comment on [OT] Merging messages between users on different machines

Replies are listed 'Best First'.
Re: [OT] Merging messages between users on different machines by BrowserUk (Patriarch) on Jun 21, 2015 at 16:40 UTC
Seems to me, without having done any kind of exhaustive analysis, that in any system of multiple sources to multiple destinations; there is only one universal frame of reference: time. (Ignoring relativity :) Any counters you employed; would need to be synchronised; and thus leave windows of opportunity for clashes and mismatches. You say "But using only timestamps will sooner or later result in a collision."; but unless Alice (or Bob) are in the habit of typing messages on two different devices simultaneously, there is no scope for timestamp collisions originating from one individual. Unless that one account could be legitimately used by a group of persons; in which case, each device should add it's device number to the timestamp to ensure uniqueness. Of course, neither timestamps alone, nor timestamps + deviceId, give any mechanism for recipients to detect they have missed a communication. I only see three possible approaches to that Guaranteed delivery; a la tcp .v. ucp: If sent messages required positive acknowledgements; and acknowledgements also require positive acknowledgements (look up "triple-handshake"), messages would be resent by their originator until it received an ack; and recipients would discard or disregard receipts until they received acks to their acks. And yes. It's recursive, so needs a bailout and start again clause. Swap (or augment) the deviceId with a device counter: If each sender (device) maintains a monotonically increasing count of outbound comms with every recipient device; then when a recipient receives a message from a given sender, it can check to see if it missed anything since the last message it received from that sender. Of course, if the sender sends a message that fails to deliver; and never sends another; the recipient will never know it missed something. Master senders and/or master receivers: If each group of devices chose and maintained an official server device. and all outbound and inbound comms were redirected through that device, it could maintain sender/receiver pair counts that could serve to uniquely identify communications; but the logistics are horrible. It would require regular handshakes between the current primary and all the other devices; and there would need to be an automated fail-over mechanism; and an automated fail-over-fail handover mechanism.... In fact. I'm regretting ever mentioning a "third option" :) All in all, treating every device to device communication as a separate id (deviceId+deviceId+count) seems to be the only sure way of detecting missed communications in a timely and guaranteed manner. And the process of sharing and consolidating communications between groups of devices allocated to any given userid is a secondary process that happens on a "best endeavour" basis. This allows for the user of a group of devices to eventually be aware of all communications and any missed ones; but it leaves a time window during which they may be unaware of the true state of the conversation. That time window of incomplete information is potentially open-ended; but that will always be the case when any device could fail permanently at any given time. Not much in the way of a useful answer there; but maybe food for thought. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply]
Re^2: [OT] Merging messages between users on different machines by Anonymous Monk on Jul 02, 2015 at 21:38 UTC
First of all, thank you very much. It took quite some time to digest your advice :) A "message - ACK - ACK-to-ACK" model represents a dialog very well. Combining sending messages with synchronisation attempts (Hi Bob, my version of dialog between us consists of 1239 messages, there's a new one attached: "..." - Hi Alice, now I have 1239 messages too, thanks!) takes care of the case when Bob had had one device with the message from Alice and the device is lost: they'll sync it next time any of them sends a message to the other. To speed up the synchronisation process in common cases the ability to "suggest" messages may be useful. For example, when node A_k has just received a new message from A_i and it wants to forward it to another node A_l, A_k can't be sure that A_l doesn't have this message yet (maybe A_l had been contacted by A_i, too); so A_k launches the synchronisation algorithm and suggests (sends the messageID) that the missing message (if any) may be the new one; if A_l doesn't have a message with this ID, it requests the message and solves the problem, at least partially; if not, that's ~20 bytes wasted. I even thought that the node which has the most messages should always suggest last (n_max-n_min) messages known to it, but that sounds a bit too optimistic (I'll do some benchmarks though). "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ. So the protocol now looks like that: Alice wants to send Bob a message from her A_i and Bob's B_j is online. A_i → B_j: send the new count of the messages and the new message's messageID and body; retransmit until reply received. If message counts still differ after processing the new message, run the binary search sync, retransmitting each, well, cue until reply is received. When a node receives a message count equal to its own one, it sends a reply "sync done, thanks" (like FIN), which is retransmitted until confirmation reply (like FIN ACK) is received. "Sync done confirmed" ("FIN ACK") is sent each time a "sync done, thanks" is received. No reply (stop clause) and therefore no retransmission is required. A_i → other A_k and B_j → other B_l: similar, but The forwarded message contains a list of nodes which already have the message; nodes add themselves to it before forwarding it further and do not contact the nodes from the list while doing so. A_i and B_j may attach the message immideately, being the first to have it; others need to "suggest" it first. Do these additions sound sane?	[reply]
Re^3: [OT] Merging messages between users on different machines by BrowserUk (Patriarch) on Jul 03, 2015 at 05:39 UTC
Starting at the end: Do these additions sound sane? Um. After my 3rd reading, nothing you've described leaps off the page as unimplementable; but ... having designed and implemented a couple of comms protocols in the past, neither would I expect to detect any flaws from reading the description. Such protocols are inherently asynchronous, and as such, it is extremely hard to envisage them from wordy descriptions. There are only 3 ways I know of to adequately design/envisage/test a protocol of this type: Comprehensive state diagrams. These are diagrams showing, in exhaustive detail, the time flow of messages and acknowledgements between commumicating pairs. You start with the simple, everything works case: `Alice Bob [new msg] -----------------> <----------------- [ack] [ack] ----------------->` [download] Then you introduce fail (non)events: `[new msg] -----------------> ... [new msg] -----------------> ... [new msg] -----------------> <----------------- [ack] [ack] ----------------->` [download] Then you introduce fail (non)events: `[new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] [ack] ----------------->` [download] Then more fail (non)events: `[new msg] -----------------> <----------------- [ack] ............ <----------------- [ack] ............ <----------------- [ack] ............ [new msg] -----------------> <----------------- [ack] [ack] ----------------->` [download] Not really an alternative to the above (which I think are an essential starting point), but complementary to them, is a simulation. Effectively, a slowed-down, high level implementation -- I'd use threads and queues; you could also use processes and sockets, though I think that is harder -- of the real thing; that runs in-box, preferably with a gui front end that allows you to watch the flows and manually break connections, add and remove comms partners, etc. and see what happens. The first time I did this, we used a simple console-type screen and ansi escape sequences for the "gui". Three "systems" were the left, top, and right edges of the screen; and we had control "buttons" on the bottom edge. Some of the buttons allowed us to single step the "time", so that we could watch the protocol in flow; and others allowed us to block or loose a message or ack, to see what happened. Etc. Mathematical modelling. (Warning: I didn't understand and had(have) no real faith in this.) The second comms protocol I worked on was effectively the same as things like MSMQ and AMQP; but before they existed. It ran under OS/2, and was designed to allow disparate systems (mainframes, minis, PCs and specialist hardware) within a large Insurance Company, full interprocess communications. I was bought in for my OS/2 expertise (subcontracted, via a bespoke software house, via IBM Global). They, the insurance company, had an employee -- an apparently brilliant mathematician -- who's brief took him all over the company to drop in on projects of all types and see what of any insights he could provide. When he dropped in on our project, we were asked to take him through our state/flow diagrams -- they were drawn in a slide show/presentation software package the name of which I forget -- and there were about 150 of them. He sat through a 2 hour, highly compressed, slide-show presentation, taking notes, asking a few pointed question and left without giving any feedback. Two weeks later he came back carrying three of those AO-sized easel pads, covered in equations. The notations were unlike anything I'd ever seen before or since. He "showed" us how these related to the state/flow diagrams we'd presented. He then showed us 3 separate places where our diagrams did not cover all the possibilities. We (none of us) understood his nomenclature; nor could we relate it back to out diagrams such that we could find the flaws he identified. But, the customer had great faith in the guy, so we had to show that we had both listened and addressed his concerns. Since we couldn't understand his stuff; and he couldn't communicate the problems in terms we understood, we offered empirical proof. We set up a dedicated machine running OS/2 and a dozen agent processes that ran continuously, with full logging, initiating random conversations between themselves. Occasionally, (randomly) they would either drop, abort or just abandon conversation. And randomly, a separate process would randomly either kill a running agent; or start a new one. That ran for a day and a half before it logged a problem. The mathematician claimed it was one of the three he had warned of. He was vindicated. It ran on, non-stop (accept for copying the log files off to free disc space) for another month without either reproducing that error; or producing either of the other two. My part of the project was done and I moved on. I never heard what was done about the other two predicted flaws. The point of the story is: multi-partner comms protocols are hard. Bugs can lurk deep in them for a very long time before they manifest themselves; and when they do, they can be nigh impossible to reproduce or track down. "Suggested" messages might also be handy in case when (for some reason I need to think through) two nodes happen to have the same number of messages, but some of the messages differ. This sounds dangerous to me. The only 'secret' for a good comms protocol I can offer, is keep it simple. The above sounds very like a case of: "Oh. This might be nice.". And that is (IMO) the start of the rot. The very moment you start adding things; without having a very clear, real scenario that must be addressed you have started on the slippery slope of either featuritise or futuritise; and that is the very opposite of KISS. As soon as you start down that road; you are on a journey to failure.(IMO :) Perhaps the best advice I could offer at this point is: Have you really looked at all the existing mechanisms and alternatives? And the second best is: Keep your communications protocol (the triple handshake stuff) entirely separate from the inbox synchronisation stuff (the binary chop). The third would have to be: if you go the route of writing this from scratch; mock up your simulation early, and run it hard. And make sure that you can quickly relate the logs back to your state/flow diagrams. And if that all sounds like I am trying to put you off from doing this all yourself; I kind of am. Because reading your description above suggests that you are conflating the triple-handshake mechanism -- who's purpose as I described it was just for guaranteed delivery -- with the synchronisation mechanism -- which I suggested should be a secondary process. A major purpose of the guaranteed delivery is that communicating ids should never be in the situation where they have differing opinions of their communications history. Either the message was sent and received, and both ids know that. Or, the delivery failed; and both ids know that. The synchronisation process only happens between devices owned by the same ID; not between IDs. I wish I could do a better job of conveying my meaning here; but I hope that this will be of some use to you. However, what you are asking for here is well beyond the scope of the type of help you should expect to get from a forum such as this. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen! -----------------	[reply] [d/l] [select]
Re^4: [OT] Merging messages between users on different machines by Anonymous Monk on Jul 25, 2015 at 18:36 UTC
Re^5: [OT] Merging messages between users on different machines by BrowserUk (Patriarch) on Jul 25, 2015 at 19:55 UTC
Re: [OT] Merging messages between users on different machines by bitingduck (Deacon) on Jun 21, 2015 at 18:57 UTC
Each message could also have stored with it a digest like MD5(Timestamp+message). Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered.	[reply]
Re^2: [OT] Merging messages between users on different machines by Anonymous Monk on Jun 23, 2015 at 10:28 UTC
MD5(Timestamp+message) That's an idea, thanks. If both timestamp and contents of two messages coincide, then the messages are, indeed, same. _{(Though I think that SHA1 is better in terms of collisions, and I won't employ SHA2 there because cryptographic integrity will be protected by SSL.)} Also, I still need a way to distinguish the messages as they are delivered to different users _{(I need some time to prepare an answer for you, BrowserUK)}. Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered. So, part of the question is comparing two sorted arrays of 20-byte numbers. Thank you, now I understand better how tools like rsync work. The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages? (for me, the worst case would be (1e5 * 20) = 2M of data per sync attempt, which is okay if performed rarely, but quite a lot - and may require resuming - on a EGDE connection)	[reply]
Re^3: [OT] Merging messages between users on different machines by BrowserUk (Patriarch) on Jun 23, 2015 at 12:16 UTC
MD5(Timestamp+message) I strongly suggest that you keep the timestamps outside of (separate from) whichever digest function you use. Digests are inherently unorderable with respect to time. The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages? In the following assume: messageNo is a monotonically increase number from oldest to newest; messageId is timestamp+digest[+deviceId+deviceId]. Swap counts; same and you're done. Otherwise; swap messageId of the message at messageNo (smaller count/2) from the beginning. Same; add (smaller count/4) to previous messageNo (smaller count/2) and exchange messageIds. goto step 3. Different; subtract (smaller count/4) from previous messageNo and exchange messageIds. goto step 3. Basically; do a binary search (oldest->newest) to locate the first difference; and then move forward from that point; exchanging first messageIds; then full messages if needed. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply]
Re^4: [OT] Merging messages between users on different machines by bitingduck (Deacon) on Jul 01, 2015 at 04:47 UTC
Re: [OT] Merging messages between users on different machines by Anonymous Monk on Jun 21, 2015 at 17:53 UTC
Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository. However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents?	[reply]
Re^2: [OT] Merging messages between users on different machines by Anonymous Monk on Jun 23, 2015 at 10:06 UTC
Your users seem to map to a set of permissions on a distibuted filesystem. Or a git repository. This is my point: the idea is already implemented many times, but all the applications are too different from the field I want to work in. For example, kludging ssh and rsync to do instant messaging would solve the problem, but it's way too painful to deploy and maintain. However, there's no description of your constraints or requirements. The words 'cryptography' or 'secure' do not appear in the question. Is the protocol destined to run on a local network with well-behaving agents? Not on a local network, but through SSL, and node users will have to explicitly trust each other's public keys before the connection is possible, so the agents should be considered well-behaved. My major is far from cryptography, so I try to go for well-known cryptographic primitives instead of trying to roll my own.	[reply]