maxl90 has asked for the wisdom of the Perl Monks concerning the following question:

I was recently given the task to pull out file changes in different versions of the same directory structure. If there is a change in the modified times between both versions then that file and location would be printed out. My initial reaction is to use file:find and loop through each file in the different versions and do a side by side comparison. I'm pretty sure this would work but take a lifetime to run and this process will have to be done each time there is a version upgrade with is quite often. If anyone has any suggestions it would be greatly appreciated.

Replies are listed 'Best First'.
Re: Directory comparison
by msemtd (Scribe) on Jun 09, 2003 at 15:47 UTC
    You seem to imply that this may take a long time to run - it shouldn't take any longer to run than is necessary for the size of the trees. Or do you mean that the "side by side comaparison", would take a long time? There are visual file comaprison tools that can compare trees - if you are on a Win32 platform, I would recommend WinMerge (open source).

    Personally, I would use GNU diff (available for most platforms) to do the quickest possible comparison for me...

    diff --recursive --brief dir1 dir2
    Believe me when I say that its very unlikely that Perl would be able to do it any quicker. There happens to be a --side-by-side option to GNU diff too!

    If I still felt it necessary to get Perl involved (say, for scheduled automation), I would capture and parse the results as appropriate.

Re: Directory comparison (no call-backs)
by tye (Sage) on Jun 09, 2003 at 16:31 UTC

    This is a perfect example when to not use File::Find. It really isn't very hard to roll your own directory tree searcher while the File::Find call-back interface makes it impossible to traverse two directory trees at once.

    Beside the standard gotchas to watch out for (don't follow symbolic links unless you do the extra work required, note that readdir of other than "." means you have to prepend the directory before you use the returned file names to get information about the files, don't use a global directory handle in opendir calls of a recursive subroutine, don't recurse into "." nor "..") also realize that readdir doesn't return file names in sorted order (while almost all globs do) so you'll want to sort (and ignore case when you sort if dealing with a file system that ignores case) before doing a merge-sort comparison to find missing/added/different files/directories (or files that became directories or vice versa).

                    - tye
Re: Directory comparison
by bobn (Chaplain) on Jun 09, 2003 at 15:29 UTC
    Use file::find, print each 'full path and filename' with mod time on one line. Save the output to a time-and-date-stamped-named file.

    Now you can use diff or sdiff (if you're on a Unix or Unix-like system) (or diff.pl or cygwin and diff or sdiff) if in Winblows) to compare any 2 files.

    Or you can use a simple perl script to read 2 files and note which files are new, deleted, or changed.

    Or you might consider RCS or CVS or some other source code control system.

    --Bob Niederman, http://bob-n.com
•Re: Directory comparison
by merlyn (Sage) on Jun 09, 2003 at 17:36 UTC
Re: Directory comparison
by pzbagel (Chaplain) on Jun 09, 2003 at 15:52 UTC

    Here's a little idea for you. If you are just checking modified times and not actual content, you can simple do something like:

    1. Do a File::Find on the old directory structure
    2. Store the information in a hash with the filenames(with full paths) as keys and the modified date/time as the value.
    3. Once you finish, do a File::Find on the new directory structure
    4. Compare the modify date/times of the new files with what you have stored for the old files

    I don't know how many files you are talking about? Thousands? Millions? Don't underestimate Perl's speed, if all you are checking is modify times then your bottleneck will be the Disk I/O not your script. And that would be the case with other solutions as well. If you are concerned about time and efficiency, you can separate the two loops above and have the resulting hash from the File::Find on the old directory structure stored using the Storable module. That way you can cache the mod times of the files before you need to compare them to the new directory structure and then just retrieve them when you run the script against the new directory.

    HTH

Re: Directory comparison
by wufnik (Friar) on Jun 10, 2003 at 08:32 UTC
    good morning.

    again, all the good things have been mostly said, so here is a snippet of code that will give you, after you supply two directories, the common files, the files unique to the first directory, the files unique to the second, and those that have been modified.

    to determine modification we use industrial strength (gisle aas') MD5 rather than relying on file sizes, timestamps, etc. ie: we determine any differences in file content, rather than modification time. this works well with Ken Williams' Tie::Textdir, the other module used. the approach is good for source (and even binaries, i find), but you may want to look at times, in which case... go elsewhere!

    use Tie::TextDir; use Digest::MD5; my ($dirA, $dirB) = @ARGV; tie my %filsA, 'Tie::TextDir', ($dirA || "."), 'rw'; tie my %filsB, 'Tie::TextDir', ($dirB || "."), 'rw'; my (%common, @uniqA, @uniqB, @modified); map { $common{$_} = Digest::MD5::md5_base64($filsA{$_}) if exists $filsB +{$_} } keys %filsA; map { push (@uniqA, $_) if ! exists $filsB{$_} } keys %filsA; map { push (@uniqB, $_) if ! exists $filsA{$_} } keys %filsB; @modified = grep { ! (Digest::MD5::md5_base64($filsB{$_}) eq $common{$_} ) } keys %common; untie %filsA; untie %filsB; print "** common to $dirA, $dirB **\n" . join "\n", keys %common; print "\n** unique to $dirA **\n" . join "\n", @uniqA; print "\n** unique to $dirB **\n" . join "\n", @uniqA; print "\n** modified files **\n" . join "\n", @modified; print "\n** finito **\n";
    which produces output like

    ** common to chaffwin, chaffwin2 ** TO_DO md5.h README COPYING ChangeLog Makefile chaffwin.el chaffwin.o md5.o chaffwin.c chaffwin.pl md5.c CVS ** unique to chaffwin ** chaffwin.exe oot ** unique to chaffwin2 ** chaffwin.exe oot ** modified files ** chaffwin.o ** finito **


    hope that helps

    ...wufnik

    -- in the world of the mules there are no rules --

Re: Directory comparison
by PetaMem (Priest) on Jun 10, 2003 at 07:06 UTC
    You may want to have a look at my diffy node. It's old code, could be done way more elegant these days, but probably it is a good starting point for your task.

    Bye
     PetaMem

Re: Directory comparison
by zentara (Cardinal) on Jun 10, 2003 at 14:37 UTC
    I just saw File::Dircmp on cpan.

    SYNOPSIS use File::Dircmp; @r = dircmp($dir1, $dir2, $diff, $suppress);

    DESCRIPTION The dircmp command examines dir1 and dir2 and generates various tabulated information about the contents of the directories. Listings of files that are unique to each directory are generated for all the options. If no option is entered, a list is output indicating whether the file names common to both directories have the same contents.

Re: Directory comparison
by maxl90 (Sexton) on Jun 09, 2003 at 17:43 UTC
    Thanks for the suggestions and keep them coming if anyone has any more. They have helped greatly. One a side note I’m working on a Unix box to do this.
Re: Directory comparison
by Anonymous Monk on Jun 10, 2003 at 16:37 UTC
    You need to create titles.txt like this:
    dir /s > titles.txt

    Then you can work with the code below to put
    in the file time/date stamps as you need it.


    open(FD1, "<titles.txt") || die "open: $!";
    #
    while($line1 = <FD1>) {

    if ($line1 =~ /Directory/) {
    $dirval = $line1;
    }

    if ($line1 =~ /g1\.jpg/) {
    if ($line1 !~ /Directory|<DIR>/) {
    $line1 =~ /.{39}(.*)$/;
    $filename = $1;
    write;
    }
    }
    }
    close(FD1);
    exit();

    format STDOUT =
    @<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $filename, $dirval
    . D:\<snip>>perl mygrep.pl message.jpg1.jpg Directory of S:\<snip>\images
    message.jpg1.jpg Directory of S:\<snip>\images
    editFunding1.jpg Directory of S:\<snip>\images
    funding1.jpg Directory of S:\<snip>\images
    tracking1.jpg Directory of S:\<snip>\images
    viewFunding1.jpg Directory of S:\<snip>\images
    g1.jpg Directory of S:\<snip>\weirdos
    g1.jpg Directory of S:\<snip>\Personal
    wag1.jpg Directory of S:\<snip>\Internet redesig
    staff-org1.jpg Directory of S:\<snip>\cutouts
    training1.jpg Directory of S:\<snip>\cutouts
    winning1.jpg Directory of S:\<snip>
    free_bg1.jpg Directory of S:\<snip>\PP 2000
    swefbldg1.jpg Directory of S:\<snip>\<snip>grist