cascadekee has asked for the wisdom of the Perl Monks concerning the following question:

I want to compare 2 text files and get an output saying they're the same or different. compare() looks like the right function, but it doesn't always seem to work for me. I have 2 files that are different, but compare() is telling me they're the same. Are there any good ways to debug such a problem? I've checked that I'm inputting the correct files, but how can I find out why compare() doesn't like them?

Replies are listed 'Best First'.
Re: File::Compare::compare() problem
by graff (Chancellor) on Aug 22, 2008 at 01:29 UTC
    I have 2 files that are different, but compare() is telling me they're the same.

    Believe me when I say I really trust you, but I think it would be a lot more satisfying for all concerned if you could post some sample data, a code snippet, and the results of an actual run of that code on that data to demonstrate the problem.

    I looked at the source code for File::Compare (you can do that too), and it seems both cleverly done and reasonably solid. So I'm inclined to say you need to prove your assertion that it says two files are the same even when they are not. For example, I proved (to a small degree) that the module works as advertised, with this sequence of five shell commands:

    $ echo blah > /tmp/j1 $ echo blah > /tmp/j2 $ perl -MFile::Compare -le '$r=compare("/tmp/j1","/tmp/j2"); print $r' 0 $ echo blag > /tmp/j2 $ perl -MFile::Compare -le '$r=compare("/tmp/j1","/tmp/j2"); print $r' 1
    The return value "0" means the files are the same (which was true in the first run) and "1" means they are different (which was true in the second run). Can you provide a counter-example?

    UPDATE: In case the problem only appears with some pair of really big files, you could also try pointing out the sizes and MD5 signatures of the two files (see Digest::MD5, or just use the gnu "md5sum" tool). Obviously, files of different sizes cannot be the same (and File::Compare checks that first), and two files of the same size but with different checksums also cannot be the same.

    Identical size and checksum is not a guarantee of identical content, but at least you only have to do a full content comparison on file pairs that match on those two factors (and computing/comparing checksums is cheaper than doing comparisons of all file data).

Re: File::Compare::compare() problem
by moritz (Cardinal) on Aug 21, 2008 at 22:22 UTC
    You can view the difference between these files with the diff utility (on unixish systems):
    diff -u file1 file2

    Maybe that helps you while debugging?

Re: File::Compare::compare() problem
by gone2015 (Deacon) on Aug 21, 2008 at 22:23 UTC

    I see that File::Compare::compare takes an optional 3rd parameter, a ref:sub for a subroutine which will be called to compare lines. You could provide a subroutine and get an inside view of what's going on.