Win has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

What is the best way, using Perl if necessary, of making sure that a subset of tables on 4 difference databases distibuted on 4 different computers, within the same network, are the same? I guess the best way of doing this is to check the table sizes. Any thoughts welcome.
  • Comment on Automated checking for identical MS SQL tables

Replies are listed 'Best First'.
Re: Automated checking for identical MS SQL tables
by john_oshea (Priest) on Jan 13, 2006 at 12:34 UTC

    Dependent on the database in question, you may find that table sizes may not actually be the same for the same data on different machines. In particular, PostgreSQL (which I'm most familiar with) may have different amounts of unused space on each machine, dependent on how recently VACUUM has been run.

    Assuming you're checking the data (rather than the schema), I'd be tempted to take a data-only dump of each table (using the same backup mechanism on all 4 machines), calculate an MD5 hash for each dump, then compare those. If you're dealing with large amounts of data this may not be particularly feasible, and this almost certainly won't work if the machines aren't all the same architecture.

    I've no idea if this is the best way of doing this, but it's what sprung to mind - am very curious to see the other replies to your question to be honest ;-)

    Update: I can't believe I missed the glaringly-obvious 'MS SQL' in the title. More coffee required for me. I still think the MD5 approach is worth considering mind you.

      If you want some code to read a db handle and MD5 hash each entry in turn, you can get it from here - look for the subroutine 'sourceToAoH'.

      It might be handy to generalise the record hashing a bit more, so that it can be used for other purposes - I'll take a look at that for the next version of this module.

      --------------------------------------------------------------

      "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

      John Brunner, "The Shockwave Rider".

Re: Automated checking for identical MS SQL tables
by marto (Cardinal) on Jan 13, 2006 at 12:34 UTC
    Win,

    Just because the table sizes are the same, does not Necessarily mean that the data is identical. You may want to read up on replication for your chosen database platform.
    Check out Replication Developer InfoCenter on msdn

    Hoep this helps.

    Martin
      Replication perhaps is an issue that I will have to look at, although I was expecting to replicate the SPROCS through a Perl program that I have and the tables through the enterprise manager.

      I guess that I could write a Perl program to look at the number of rows in each table, for each database, and then make sure that they tally. Although this does not guarantee that the tables have not been tampered with, it will probably be good enough for my purposes.