in reply to Verifying 2 databases are in sync with brute force

It takes 2 hours to dump a backup

Iterating over the whole database in Perl will take even longer. So doing the dump on both DB's and comparing checksums of the dumps is probably faster than any full Perl solution.

There's also pg-comparator, which might be worth looking at.

Another possible trick is to aggregate all rows of a table into a single one using array_agg or one of its companions, and implement a checksum in postgres itself (or maybe there's one available already?). That way less data needs to be transferred out of the DB, making the process much faster.

  • Comment on Re: Verifying 2 databases are in sync with brute force

Replies are listed 'Best First'.
Re^2: Verifying 2 databases are in sync with brute force
by dwhite20899 (Friar) on May 31, 2011 at 17:26 UTC
    I didn't even consider comparing the dumps - I just took it for granted the files would be different due to dated comments or non-deterministic output. But I'll give it a try!

    pg-comparator looks interesting. Thanks!

      I just dumped the same DB twice in a row (with pg_dump --clean -h localhost -U <user> <dbname>) and both files are identical, no timestamp information and nothing indeterministic in this particular DB.
Re^2: Verifying 2 databases are in sync with brute force
by perlfan (Parson) on Jun 01, 2011 at 14:32 UTC
    I like the aggregation idea. If you had a table with an entry for all other tables (perhaps, even itself) and it contained a checksum of some aggregate of the respective tables, you could be reasonable sure that the databases were in sync. The updating of the checksum table should be an ongoing process to minimize checksumming each entire table each time.