in reply to Re: Testing database updates.
in thread Testing database updates.

I concur, with one proviso: you probably want to make sure that your dump script dumps records in a predictable order. Most of the dump scripts bundled with databases do not guarantee any particular order of results. You may need to write your own simple dump script.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^3: Testing database updates.
by BrowserUk (Patriarch) on Apr 18, 2012 at 22:59 UTC

    Though these dump utilities don't offer any particular ordering, in general, given the same versions & switches etc. they will produce the same ordering for the same source material. Ie. They don't have a random component.

    Whilst it is possible that two DBs that were arrived at via different routes -- ie. a different sequencing of inserts, updates and deletes -- may produce subtly different dumps due to that ordering, that the OP might decide to consider as "identical", it will require a decision to consider them so.

    That is, whilst diff'ing the dumps may produce "false negatives", it will require a human's decision to make the determination that they are indeed false.

    There is also the possibility to post process the dumps to achieve some 'canonical' format -- eg. trim whitespace; sort etc. -- but that should rarely be necessary.

    For the purposes of checking refactoring, it is safe to assume that the OP would be targeting the same versions of the DBMS & utility for both the reference and check dumps; and any variations between them will be of interest, even if ultimately dismissed as irrelevant.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Values from sequences (e.g. postgres' serial datatype) can (validly) vary and will often thwart the otherwise so attractively simple dump & diff approach.

      And IMHO tobyink is right: without index, order is not garantueed, and I'd consider it unwise to rely on order in the database on the basis of the order of the source material.

      Just a general remark: the OP doesn't give much detail, and doesn't mention what system he works with, but in case the system is PostgreSQL: it has a md5() function which can be handy for directly comparing data (and searching for differences via SQL).

        Values from sequences (e.g. postgres' serial datatype) can (validly) vary and will often thwart the otherwise so attractively simple dump & diff approach.

        Assuming a known starting position and the same sequence of operations applied to that starting position; and the same versions; would you not expect even such serial types to attain the same values and be dumped in similar order?

        Of course, the OP said: "I don't care about the order of inserts, or about any reads, just that the overall state at the end is what is should be.", which could mean he would choose to ignore subtle differences in sequence data-types; but were I refactoring such code, I'd at least like to be informed of any differences, before choosing to dismiss them. Assuming the problem actually exists in his dataset.

        I'd consider it unwise to rely on order in the database on the basis of the order of the source material.

        There is no risk attached because any such differences would be flagged as different; even if they turn out to be false negatives.

        My personal take is that I'd much rather start out looking at everything and explicitly choose what I can disregard -- probably through post processing the dumps -- than to prejudge the issues by preselecting what to compare.

        I'd also steer clear of tying myself to the facilities of any given DBMS if there is any chance that the process might need to be ported to others.

        I'd also avoid reliance upon digest type comparisons -- except as a quick check to avoid more detailed inspection, which can equally be applied directly to the dump files -- as a failure of digests to compare only tells you something is different. Not what.

        But this is just the way I would go about tackling the OPs problem. He is obviously free to choose between this and all the other suggestions offered; his own ideas; or any combination thereof.

        Feel free to add yours to the list.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?