in reply to Re^2: Comparing tables over multiple servers
in thread Comparing tables over multiple servers

Hmm. If the tables are that large, then you have a problem anyway. No matter how you arrange to do the comparison, it still requires that at least one copy of the table be tranmitted between each pair of machines in order for the comparison to take place.

Having transmitted that data, a blanket replacement will always be quicker than a compare. The time spent locked whilst transmitting remains a constant and the time spent locked replacing will pale into insignificance relative to the time spent comparing.

If the tables are this large and dynamic (from what the OP said, his seemed to be relatively small & static), then you would obviously need to use some kind of dynamic replication or true distributed updates, but that is quite a different situation and a considerably more involved undertaking than sync'ing a few, small, relatively static reference tables, which was my interpretation of the OP situation.

Maintaining multiple copies of large dynamic tables in a distributed DB environment is the current holy grail of RDMBS development, still a proprietary black art for the most part, and hardly the sane subject of a "quick perl script" to undertake.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^3: Comparing tables over multiple servers

Replies are listed 'Best First'.
Re^4: Comparing tables over multiple servers
by rnahi (Curate) on Oct 27, 2005 at 20:24 UTC
    it still requires that at least one copy of the table be tranmitted between each pair of machines in order for the comparison to take place

    Not necessarily. As this solution suggests, you can calculate a global CRC locally on each server, and then compare the results.

      Okay. You got me there. Though I do have to wonder about the time spent calculating the CRCs (using SQL) relative to transmitting the data. If the machines are connected by a high-speed (10Mbits/s or greater) network connection, I would think it would be a close run thing.

      It would be quicker if you could CRC the underlying filesystem entities in which the data is stored, but of course, any difference in the transaction histories, database configurations or versions, or even the disk drives would thrown the comparison out.

      And, once you have discovered that there is a difference, you have still to corrected it, and you're right back where you started from needing to transmit the data.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Just think about this: to move a table across the network you can either (a) make a physical copy, and this includes data, indexes, triggers, transaction caches, and whatever the DBMS keeps in store for that table, or (b) copy just the data as SQL statements or as exportable text, but then the receiving DBMS should rebuild the indexes.

        Instead, to calculate a CRC you have just a table scan.

        Therefore you should compare the time needed to calculate a SHA twice per record (one for the record and one to apply it to a global SHA) versus the time needed to pass all the above mentioned stuff over the network.

        I put my bet on SHA. :)