Re: [OT] Merging messages between users on different machines

Replies are listed 'Best First'.
Re^2: [OT] Merging messages between users on different machines by Anonymous Monk on Jun 23, 2015 at 10:28 UTC
MD5(Timestamp+message) That's an idea, thanks. If both timestamp and contents of two messages coincide, then the messages are, indeed, same. _{(Though I think that SHA1 is better in terms of collisions, and I won't employ SHA2 there because cryptographic integrity will be protected by SSL.)} Also, I still need a way to distinguish the messages as they are delivered to different users _{(I need some time to prepare an answer for you, BrowserUK)}. Then you sync by comparing lists of the hash, which is conveniently fixed length and easily ordered. So, part of the question is comparing two sorted arrays of 20-byte numbers. Thank you, now I understand better how tools like rsync work. The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages? (for me, the worst case would be (1e5 * 20) = 2M of data per sync attempt, which is okay if performed rarely, but quite a lot - and may require resuming - on a EGDE connection)	[reply]
Re^3: [OT] Merging messages between users on different machines by BrowserUk (Patriarch) on Jun 23, 2015 at 12:16 UTC
MD5(Timestamp+message) I strongly suggest that you keep the timestamps outside of (separate from) whichever digest function you use. Digests are inherently unorderable with respect to time. The open part of the question is: which shortcuts I can apply to avoid exchanging full lists of messages? In the following assume: messageNo is a monotonically increase number from oldest to newest; messageId is timestamp+digest[+deviceId+deviceId]. Swap counts; same and you're done. Otherwise; swap messageId of the message at messageNo (smaller count/2) from the beginning. Same; add (smaller count/4) to previous messageNo (smaller count/2) and exchange messageIds. goto step 3. Different; subtract (smaller count/4) from previous messageNo and exchange messageIds. goto step 3. Basically; do a binary search (oldest->newest) to locate the first difference; and then move forward from that point; exchanging first messageIds; then full messages if needed. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply]
Re^4: [OT] Merging messages between users on different machines by bitingduck (Deacon) on Jul 01, 2015 at 04:47 UTC
I strongly suggest that you keep the timestamps outside of (separate from) whichever digest function you use. Digests are inherently unorderable with respect to time. I've been a little busy this week so didn't bother looking back at this thread, but my (unstated) reasoning behind putting the timestamps inside the hash was for easy sortability on a single field, at the expense of easy time ordering. It also guarantees that short messages that may be have dupes in the past or future get unique digests. It might be reasonable to include the timestamp both inside the digest and outside as a separate column, depending on the application.	[reply]