michbach has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
im looking for the fastest way to compare two picture-Files (e.g. bmp or jpg) with Perl. Is there a perl-specific command to realize this or do perl use only the normal operating-systems commands like "comp" or like. I only want to know is the file different or not. So may be the fastest command would be if the command stops immediately the comparison on the first difference and give immediately the result back.
I use Windows XP Pro and first i tried it with the XP- command "comp" (Code: `comp file1 file2`;). But the comp command always ask interactively for furter compare and i didnt find a way to avoid this. But bar of this is there a good perlcommand to do this work?
Thanks for any answer! Best regards michbach.

Ok, ive seen what in my mind is so clearly is in my question so ambiguous! Now here exact what i do. I make a screenshot of a certain desktop area and saved it as an jpg-file with name e.g. tested_1.jpg. After i while (e.g. 1 day) i make a screenshot again of the same desktop area and save it as an jpg-file with name tested_2.jpg. Now i want to know has file1 the same content as file2. So if the content the same nothing has changes on this desktop area! Is the content differnt something must changed on the desktop area. And i looking for the fastest was to do this under Perl. I dont want to bind in C-code or like. I hope i have declard it a littel bit better what i want. At last i wanna say thanks to all responders!

Replies are listed 'Best First'.
Re: Fast way to compare two picture-files
by ikegami (Patriarch) on Dec 29, 2008 at 01:52 UTC

    Sounds like you want to compare the files and not the pictures?

    sub comp { my ($qfn1, $qfn2) = @_; open(my $fh1, '<:raw:stdio', $qfn1) or die("Unable to open file \"$qfn1\": $!\n"); open(my $fh2, '<:raw:stdio', $qfn2) or die("Unable to open file \"$qfn2\": $!\n"); my $size1 = -f $fh1 && -s _; my $size2 = -f $fh2 && -s _; return 0 if $size1 && $size2 && $size1 != $size2; local $/ = \(16*1024); for (;;) { my $blk1 = <$fh1>; my $blk2 = <$fh2>; return 0 if defined($blk1) xor defined($blk2); return 1 if !defined($blk1); return 0 if if $blk1 ne $blk2; } }

    Taking the a digest (such as MD5) of the file which speed things up if you need to compare a file against multiple other files.

    Update: Added size check up front.

      Why write it manually? Someone else already did it.
      use File::Compare; compare("foo.jpg", "bar.jpg");

      Depending on what the OP is asking for, checking the first 16K of the file won't provide the correct answer if a) the OP is trying to determine are the images the same (rather than just are the files identical) and b) the files are in different formats (say, JPG and BMP) but are actually the same image.

      Update Oops -- ikegami is quite right, the loop checks the entire file, not just the first 16K.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        The code checks the entire file (not just the first 16K). "for (;;)" can be read as "for ever". And the post already says the code compares files not images.

Re: Fast way to compare two picture-files
by blue_cowdawg (Monsignor) on Dec 29, 2008 at 01:54 UTC
        looking for the fastest way to compare two picture-Files

    Not sure if you are trying to do a "this picture file is a duplicate file of this picture file" or if you're tyring to do some sort of image recognition so I don't know if this is what you are asking for.

    If you are just looking at two files to see if they are duplicates one thought I have is to do an MD5 sum on the two files and see if the hash matches. If they match chances are they are the same image. If not they are definitely not hte same image.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

      Why do people always suggest finding the digest to compare two files? Graciously assuming calculating the digest is instantaneous, using the digest method requires reading the entire file whereas one can usually exit after one pair of reads using a byte for byte comparison.

      Using the digest method is great when comparing a file against multiple others over a long time period. It's a poor method when comparing two files.

      If they match chances are they are the same image.

      If you're going to follow up by checking the files byte for byte, it might make more sense to find the digest of the first X bytes of the file instead of the digest of the entire file.

        It's out of similar considerations that some time ago I wrote dupseek. Even for multiple files, just using parts of file contents instead of digests works quite well (and rules out even the very small risk of collisions).


        The stupider the astronaut, the easier it is to win the trip to Vega - A. Tucket
Re: Fast way to compare two picture-files
by spx2 (Deacon) on Dec 29, 2008 at 09:14 UTC
    It depends very much to what degree you want to compare those images. I fully agree with ikegami on the fact that if you want to know if two files are exactly the same you can just use byte-to-byte comparison instead of computing hashes for the files,but in the worst case(which is rarely the case in practice) you will have lots of duplicates and then the MD5 will payoff(but again,that is rarely the case). But all of this matters only if you have substantial amounts of data on which you run your scripts,if you are dealing with small amounts it's not worth the trouble of optimizing. Also, maybe what you are trying to do is not byte-to-byte match but you are looking for a similarity match,maybe the images are very similar. Someone has thought of this and has written something that makes the difference between pixels of an image and if the differences are below a certain mindist(a threshold value) than the images are said to be similar,otherwise they are different. The code for that is in the Image::Filters module over here. Things can also get more complicated if you consider comparing images that are a scaling of one another or maybe comparing ones that are a translation of one another, you could use the Hausdorff measure, an implementation of that in pure C is here but to applu this you probably first have to use some kind of edge-detection. These things are explained here in a bit more detail. Some people have also written some nice software for searching images based on similarity called imgseek. So first decide what you exactly mean by image comparison and then choose what is more appropriate.