I'm really surprised that noone has mentioned checksumming for this operation.

While it would take a lot of processing, I'm pretty sure that any technique to compare tables is going to take a lot of processing power. For some large tables, in memory comparison of actual values will not work at all, as the working set (your entire table size * number of tables to be compared) will not fit into ram + swap.

Checksumming will help by providing a unique (or nearly so) value for incoming data, while not having to store the data. For instance, you can generate a MD5 sum of a given string, and it will return 32 characters. If you join the fields of your table together, md5sum that, and compare it to the MD5sums from the other tables, you can find out if they are different. Of course, once you have determined that the rows are different, you will have to reprocess them to find out HOW they are different.

This can even be extended further, by MD5ing multiple rows at the same time - which will increase your reprocressing time to find out HOW and WHICH ROW are different, but will further decrease the memory requirements.

Taken to obscene levels, you could conceivably make one MD5sum per table, and compare those. If they are the same, the tables are the same. If they are different, the tables are different - but then of course you still need to do more work to find out WHERE the tables are different


In reply to Re: Comparing tables over multiple servers by Tuppence
in thread Comparing tables over multiple servers by mnlight

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.