Hi!
My question is not directly related to Perl and I currently don't have any code to show, but people here usually have helped me with algorithms a lot in the past and I plan on implementing this in Perl.
Let's suppose that there are two users, Alice and Bob, each of them has a set of machines/addresses A1..An and B1..Bm. It's not guaranteed that any pair of machines can talk to each other at any given moment of time, because Alice and Bob take their machines offline while they do not use them, but machines which are online can interconnect freely.
Alice and Bob want to exchange messages and want to access all message history from any of their devices. What algorithm should they use to achieve that?
My thinking is along the following lines:
- Each message is identified by a timestamp, big enough so it won't suffer from 2038 problem (also, 1e-3 second precision should be enough for IM). But using only timestamps will sooner or later result in a collision. Perhaps (sender address + timestamp)? Another option would be (sender address + sequentially incrementing integer), and the integer shouldn't even be that large (I've checked: my IM history is around 1e5 messages in size after a few years, so while 16-bit is too small, 32-bit will last more than I probably will). Which is better and why?
- Alice (from the machine Ai) wants to send a message to Bob, so she tries to connect to one of Bj known to her. Once a connection a successful and the message is delivered, she marks the message as sent and stops connecting to Bob. At the same time Ai queues the message to be delivered to other Alice's machines A1..An.
- Bob's machine behaves similarly: once he receives a message from Ai on his machine Bj, it forwards the message to other B1..Bm, once they show online.
Of course, it's not going to work.
- Alice sends Bob a picture of her cat trying to get inside a box. The moment the message is delivered a TLA agent tries to shoot Bob with a laser rifle. Bob escapes alive and unharmed, but his smartphone has paid its life for Bob and won't show him the picture of the cat. Now Alice has the message marked as delivered, so she won't retry sending it, but none of Bob's computers have it. Bob will never know whether cat achieved its goal or not. TLA has won.
- Machine B1 is Bob's home PC, B2 is his work PC and B3 is his smartphone (he bought another one). Ai sends him a message while he's at home, so she connects to B1, and B1 forwards the message to B3. While B3 delivers the message to B2 afterwards, both B1 and B2 will keep the message as "still needs to be delivered" forever because they can't talk to each other.
To get around (1) Bob needs a way to ask Alice to restart the delivery of messages of which he does not know. Alice: "But what messages don't you know of?" Bob: "I don't know!". Should Bob periodically ask Alice whether she sent him messages later than <timestamp of the last message from Alice>? Should Alice ask new Bob's smartphone whether he has the message with <timestamp of the message Alice sent to Bob's old smartphone>?
To get around (2) (both m*(m-1) connections to forward a message instead of (m-1) and storing "undelivered" messages forever) a forwarded message should contain information about which computers have already received this message. But this would solve the problem only partway: while B2 will know that it shouldn't forward the message any further because B3 and B1 already have it, B1 won't know that and will keep trying to connect to B3. Another example: Claire has 4 computers which interconnect 100% of the time, so delivering a message to C1 causes 3 useful connections from C1 to C2..4 and another 6 redundant and useless ones between C2..4. At this point I'm close to reimplementing TCP ACK and TTL which means I'm going to over-engineer the protocol, and it will still break under some rare condition I can't think of right now. And this kind of redundancy, while causing (9*8=72 useless/9 useful) = 8 times more useless traffic than useful in case of 10 interconnecting machines, will still be required in case when these machines come online only sequentially and in pairs (like in a party game).
The question: what algorithms (even better, with well-tested implementations on CPAN) should Alice and Bob search for, if they want to merge their instant messages between their machines, some of which may not directly connect to each other? And thank you for your patience if you have stayed with me for the duration of the post.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.