in reply to Re^3: Comparing images
in thread Comparing images

Your use of statistics is somewhat misguided. The images that people look at are not remotely random, and one cannot assume that they are. pengvado is correct in saying that in raw bitmaps (particularly of few-colored things), there is a high probability of getting identical bytes. If you want to suggest "simple" methods to check if two files are identical before doing full MD5s, go ahead. (Filesize, first few bytes, random byte, and CRCs are all good suggestions.) Considering the OP hinted at a hashing method, please keep in mind the birthday paradox: remember that collisions among many objects are much likelier than those among just two. (Incidentally, your equations are missing a 1- on the LHS and may be more simply written as 1/256 and 1/256^2)

To summarize: Equally sized and colored Canadian and Chinese flags would have about a 50% chance of differing at a random byte (they are mostly the same shade of red). Similarly with United States and Japanese flags (they share a lot of white).

++pengvado.

Replies are listed 'Best First'.
Re^5: Comparing images
by BrowserUk (Patriarch) on Nov 28, 2006 at 01:48 UTC
    If you want to suggest "simple" methods to check if two files are identical before doing full MD5s, go ahead.

    Thankyou. That's exactly what I already did. And all I did.

    But...

    Equally sized and colored Canadian and Chinese flags would have about a 50% chance of differing at a random byte (they are mostly the same shade of red). Similarly with United States and Japanese flags (they share a lot of white).

    If these images are produced by a graphics programs, you (may, sometimes) be correct. If both images are produced by the same author, or both authors choose exactly the same shades of red or white. Maybe.

    However, if these images are photographs, taken by different cameras, and/or different lighting conditions, and/or rippled by different winds, and/or catching reflected light from differently colored surroundings, and/or are different aged and therefore faded, and/or made of differing materials with respectively differing modulos of reflection, and/or the lens are dirty, and/or the cameras are differently focused, and/or compressed at differing ratios/quallities and/or dozens of other factors...

    Your idealised judgement is wrong.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      However, if these images are photographs, ...

      The OP never mentioned photographs.

        Okay. Here's a challenge for you. Find two images, one of the Chinese flag, one of the Canadian flag, posted on different web sites, that would not be differentiated by their filesize + the color value of their middle pixel.

        Good luck.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.