Directory comparison

maxl90 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Directory comparison by msemtd (Scribe) on Jun 09, 2003 at 15:47 UTC
You seem to imply that this may take a long time to run - it shouldn't take any longer to run than is necessary for the size of the trees. Or do you mean that the "side by side comaparison", would take a long time? There are visual file comaprison tools that can compare trees - if you are on a Win32 platform, I would recommend WinMerge (open source). Personally, I would use GNU diff (available for most platforms) to do the quickest possible comparison for me... `diff --recursive --brief dir1 dir2` [download] Believe me when I say that its very unlikely that Perl would be able to do it any quicker. There happens to be a --side-by-side option to GNU diff too! If I still felt it necessary to get Perl involved (say, for scheduled automation), I would capture and parse the results as appropriate.	[reply] [d/l]
Re: Directory comparison (no call-backs) by tye (Sage) on Jun 09, 2003 at 16:31 UTC
This is a perfect example when to not use File::Find. It really isn't very hard to roll your own directory tree searcher while the File::Find call-back interface makes it impossible to traverse two directory trees at once. Beside the standard gotchas to watch out for (don't follow symbolic links unless you do the extra work required, note that readdir of other than "." means you have to prepend the directory before you use the returned file names to get information about the files, don't use a global directory handle in opendir calls of a recursive subroutine, don't recurse into "." nor "..") also realize that readdir doesn't return file names in sorted order (while almost all globs do) so you'll want to sort (and ignore case when you sort if dealing with a file system that ignores case) before doing a merge-sort comparison to find missing/added/different files/directories (or files that became directories or vice versa). - tye	[reply]
Re: Directory comparison by bobn (Chaplain) on Jun 09, 2003 at 15:29 UTC
Use file::find, print each 'full path and filename' with mod time on one line. Save the output to a time-and-date-stamped-named file. Now you can use diff or sdiff (if you're on a Unix or Unix-like system) (or diff.pl or cygwin and diff or sdiff) if in Winblows) to compare any 2 files. Or you can use a simple perl script to read 2 files and note which files are new, deleted, or changed. Or you might consider RCS or CVS or some other source code control system. --Bob Niederman, http://bob-n.com	[reply]
•Re: Directory comparison by merlyn (Sage) on Jun 09, 2003 at 17:36 UTC
I have a column describing an rsync-in-perl program that you might want to use as a start for your tool. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: Directory comparison by pzbagel (Chaplain) on Jun 09, 2003 at 15:52 UTC
Here's a little idea for you. If you are just checking modified times and not actual content, you can simple do something like: Do a File::Find on the old directory structure Store the information in a hash with the filenames(with full paths) as keys and the modified date/time as the value. Once you finish, do a File::Find on the new directory structure Compare the modify date/times of the new files with what you have stored for the old files I don't know how many files you are talking about? Thousands? Millions? Don't underestimate Perl's speed, if all you are checking is modify times then your bottleneck will be the Disk I/O not your script. And that would be the case with other solutions as well. If you are concerned about time and efficiency, you can separate the two loops above and have the resulting hash from the File::Find on the old directory structure stored using the Storable module. That way you can cache the mod times of the files before you need to compare them to the new directory structure and then just retrieve them when you run the script against the new directory. HTH	[reply]
Re: Directory comparison by wufnik (Friar) on Jun 10, 2003 at 08:32 UTC
good morning. again, all the good things have been mostly said, so here is a snippet of code that will give you, after you supply two directories, the common files, the files unique to the first directory, the files unique to the second, and those that have been modified. to determine modification we use industrial strength (gisle aas') MD5 rather than relying on file sizes, timestamps, etc. ie: we determine any differences in file content, rather than modification time. this works well with Ken Williams' Tie::Textdir, the other module used. the approach is good for source (and even binaries, i find), but you may want to look at times, in which case... go elsewhere! use Tie::TextDir; use Digest::MD5; my ($dirA, $dirB) = @ARGV; tie my %filsA, 'Tie::TextDir', ($dirA \|\| "."), 'rw'; tie my %filsB, 'Tie::TextDir', ($dirB \|\| "."), 'rw'; my (%common, @uniqA, @uniqB, @modified); map { $common{$_} = Digest::MD5::md5_base64($filsA{$_}) if exists $filsB +{$_} } keys %filsA; map { push (@uniqA, $_) if ! exists $filsB{$_} } keys %filsA; map { push (@uniqB, $_) if ! exists $filsA{$_} } keys %filsB; @modified = grep { ! (Digest::MD5::md5_base64($filsB{$_}) eq $common{$_} ) } keys %common; untie %filsA; untie %filsB; print " common to $dirA, $dirB \n" . join "\n", keys %common; print "\n unique to $dirA \n" . join "\n", @uniqA; print "\n unique to $dirB \n" . join "\n", @uniqA; print "\n modified files \n" . join "\n", @modified; print "\n finito \n"; [download] which produces output like ` common to chaffwin, chaffwin2 TO_DO md5.h README COPYING ChangeLog Makefile chaffwin.el chaffwin.o md5.o chaffwin.c chaffwin.pl md5.c CVS unique to chaffwin chaffwin.exe oot unique to chaffwin2 chaffwin.exe oot modified files chaffwin.o finito ` [download] hope that helps ...wufnik -- in the world of the mules there are no rules --	[reply] [d/l] [select]
Re: Directory comparison by PetaMem (Priest) on Jun 10, 2003 at 07:06 UTC
You may want to have a look at my diffy node. It's old code, could be done way more elegant these days, but probably it is a good starting point for your task. Bye PetaMem	[reply]
Re: Directory comparison by zentara (Cardinal) on Jun 10, 2003 at 14:37 UTC
I just saw File::Dircmp on cpan. SYNOPSIS use File::Dircmp; @r = dircmp($dir1, $dir2, $diff, $suppress); DESCRIPTION The dircmp command examines dir1 and dir2 and generates various tabulated information about the contents of the directories. Listings of files that are unique to each directory are generated for all the options. If no option is entered, a list is output indicating whether the file names common to both directories have the same contents.	[reply]
Re: Directory comparison by maxl90 (Sexton) on Jun 09, 2003 at 17:43 UTC
Thanks for the suggestions and keep them coming if anyone has any more. They have helped greatly. One a side note I’m working on a Unix box to do this.	[reply]
Re: Directory comparison by Anonymous Monk on Jun 10, 2003 at 16:37 UTC
You need to create titles.txt like this: dir /s > titles.txt Then you can work with the code below to put in the file time/date stamps as you need it. open(FD1, "<titles.txt") \|\| die "open: $!"; # while($line1 = <FD1>) { if ($line1 =~ /Directory/) { $dirval = $line1; } if ($line1 =~ /g1\.jpg/) { if ($line1 !~ /Directory\|<DIR>/) { $line1 =~ /.{39}(.*)$/; $filename = $1; write; } } } close(FD1); exit(); format STDOUT = @<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $filename, $dirval . D:\<snip>>perl mygrep.pl message.jpg1.jpg Directory of S:\<snip>\images message.jpg1.jpg Directory of S:\<snip>\images editFunding1.jpg Directory of S:\<snip>\images funding1.jpg Directory of S:\<snip>\images tracking1.jpg Directory of S:\<snip>\images viewFunding1.jpg Directory of S:\<snip>\images g1.jpg Directory of S:\<snip>\weirdos g1.jpg Directory of S:\<snip>\Personal wag1.jpg Directory of S:\<snip>\Internet redesig staff-org1.jpg Directory of S:\<snip>\cutouts training1.jpg Directory of S:\<snip>\cutouts winning1.jpg Directory of S:\<snip> free_bg1.jpg Directory of S:\<snip>\PP 2000 swefbldg1.jpg Directory of S:\<snip>\<snip>grist	[reply]