in reply to Comparing tables over multiple servers

Rather than thinking of this as an any to any comparison with the exponential growth that can result,designate one machine as the 'master' and have it send it's table to the other 6 (n) machines, that simply update/replace their copy from the master.

No comparison necessary. A straight delete and replace of the table is going to be much quicker than a (multi-way) comparison.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Comparing tables over multiple servers

Replies are listed 'Best First'.
Re^2: Comparing tables over multiple servers
by rnahi (Curate) on Oct 27, 2005 at 19:48 UTC

    Would you still recommend this solution if the table size were, say, 100 GB?

    Think about locking all seven tables until copies are delivered. (If you don't, the table you pass to the slaves may be different from the one that is being modified in the master copy.)

      Hmm. If the tables are that large, then you have a problem anyway. No matter how you arrange to do the comparison, it still requires that at least one copy of the table be tranmitted between each pair of machines in order for the comparison to take place.

      Having transmitted that data, a blanket replacement will always be quicker than a compare. The time spent locked whilst transmitting remains a constant and the time spent locked replacing will pale into insignificance relative to the time spent comparing.

      If the tables are this large and dynamic (from what the OP said, his seemed to be relatively small & static), then you would obviously need to use some kind of dynamic replication or true distributed updates, but that is quite a different situation and a considerably more involved undertaking than sync'ing a few, small, relatively static reference tables, which was my interpretation of the OP situation.

      Maintaining multiple copies of large dynamic tables in a distributed DB environment is the current holy grail of RDMBS development, still a proprietary black art for the most part, and hardly the sane subject of a "quick perl script" to undertake.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        it still requires that at least one copy of the table be tranmitted between each pair of machines in order for the comparison to take place

        Not necessarily. As this solution suggests, you can calculate a global CRC locally on each server, and then compare the results.