in reply to Comparing tables over multiple servers
I'm really surprised that noone has mentioned checksumming for this operation.
While it would take a lot of processing, I'm pretty sure that any technique to compare tables is going to take a lot of processing power. For some large tables, in memory comparison of actual values will not work at all, as the working set (your entire table size * number of tables to be compared) will not fit into ram + swap.
Checksumming will help by providing a unique (or nearly so) value for incoming data, while not having to store the data. For instance, you can generate a MD5 sum of a given string, and it will return 32 characters. If you join the fields of your table together, md5sum that, and compare it to the MD5sums from the other tables, you can find out if they are different. Of course, once you have determined that the rows are different, you will have to reprocess them to find out HOW they are different.
This can even be extended further, by MD5ing multiple rows at the same time - which will increase your reprocressing time to find out HOW and WHICH ROW are different, but will further decrease the memory requirements.
Taken to obscene levels, you could conceivably make one MD5sum per table, and compare those. If they are the same, the tables are the same. If they are different, the tables are different - but then of course you still need to do more work to find out WHERE the tables are different
|
|---|