dimes has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to diff two files that is 'native' to perl 5.6.1? i.e. without an aditional module from the standard distro? I am in the process of converting a ksh script I wrote....it renames a local $file to $file_old, retrieves $file via Net::ftp and saves it localy...I then need to compare it to the old one to see if its different, then pass it on for processing if it is. I have pretty much the whole script working except the diff. I have looked through my books and spent some time google'ing....but haven't come across a 'native' way to do this...just references to Algorithym::diff. Thanks, ye Monks'O'Wisdom Dimes

Replies are listed 'Best First'.
Re: File Diff'ing
by Corion (Patriarch) on Jun 12, 2002 at 15:07 UTC

    Even though you talk about "Diff'ing", you don't want the (minimal) set of changes to get from one file to the second, you only want to know whether two files are identical or not (or that's the interpretation I lay into your words).

    There are several ways to achieve what you want. The easiest way would be to use Digest::MD5, which comes with Perl 5.6 in the core. If the two files have an identical MD5 hash, they most likely are the same.

    If your version of Perl dosen't have Digest::MD5, you might want do do the check manually, first checking whether the two files have the same file size (via the tell function or the -s function (perldoc -f -X), and then slurping the two files into memory and doing an eq comparision on them.

    If the files are too large to be held in memory at one time, you might want to compare little chunks of the two files one at a time, starting either from the beginning or the end of the file, whichever part has the more likely chance of being different.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: File Diff'ing
by robobunny (Friar) on Jun 12, 2002 at 15:11 UTC
    i don't know of anything that comes with the standard distribution, but you probably want to do a checksum instead of a diff. it will be much faster if you don't care at that point what the actual differences are. you can store the checksum when you download the file, so that you don't have to recompute it when you download the next one.
Re: File Diff'ing
by Abigail-II (Bishop) on Jun 12, 2002 at 15:13 UTC
    What's this problem with using a module outside of the standard distribution? As you said, there's Algorithm::Diff. If you have a xenomodule phobia, I doubt the license prevents you from just pasting the content of Algorithm::Diff into your file.

    Abigail

      I am not particulary phobic of modules...but for something that I "see" as pretty basic....it seemed that I shouldn't have to go external to stock perl just to test to see if two files are the same or not. Thanks all for the tips...I was leaning towards "fingerprinting" the files via md5 et. al. and now it seems pretty clear that it is the way to go. Thanks again Dimes
Re: File Diff'ing
by kvale (Monsignor) on Jun 12, 2002 at 19:22 UTC
    Depending on the exact nature of the problem, even Digest::MD5 might be overkill.

    If your probelm is just to compare an old file to a new file once to see if they differ, then MD5 is unnecessary. Simply compare their sizes. If they differ, you are done. If they are the same, open the two files and compare line by line, breaking out of the loop at the first difference. Easy and faster than hashing both files first.

    If your problem is to compare a new file against many old files (to reject duplicates) or to compare many new files to an old file (sample and do something when a file is updated) then hashing to an MD5 signature is the fastest approach. Although it depends on differing files generating differing signatures, the chance of a collision is, welll, you should live so long :)

    -Mark