You seem to imply that this may take a long time to run - it shouldn't take any longer to run than is necessary for the size of the trees. Or do you mean that the "side by side comaparison", would take a long time? There are visual file comaprison tools that can compare trees - if you are on a Win32 platform, I would recommend WinMerge (open source).
Personally, I would use GNU diff (available for most platforms) to do the quickest possible comparison for me...
diff --recursive --brief dir1 dir2
Believe me when I say that its very unlikely that Perl would be able to do it any quicker. There happens to be a --side-by-side option to GNU diff too!
If I still felt it necessary to get Perl involved (say, for scheduled automation), I would capture and parse the results as appropriate. | [reply] [d/l] |
This is a perfect example when to not use File::Find. It really isn't very hard to roll your own directory tree searcher while the File::Find call-back interface makes it impossible to traverse two directory trees at once.
Beside the standard gotchas to watch out for (don't follow symbolic links unless you do the extra work required, note that readdir of other than "." means you have to prepend the directory before you use the returned file names to get information about the files, don't use a global directory handle in opendir calls of a recursive subroutine, don't recurse into "." nor "..") also realize that readdir doesn't return file names in sorted order (while almost all globs do) so you'll want to sort (and ignore case when you sort if dealing with a file system that ignores case) before doing a merge-sort comparison to find missing/added/different files/directories (or files that became directories or vice versa).
- tye
| [reply] |
Use file::find, print each 'full path and filename' with mod time on one line. Save the output to a time-and-date-stamped-named file.
Now you can use diff or sdiff (if you're on a Unix or Unix-like system) (or diff.pl or cygwin and diff or sdiff) if in Winblows) to compare any 2 files.
Or you can use a simple perl script to read 2 files and note which files are new, deleted, or changed.
Or you might consider RCS or CVS or some other source code control system.
--Bob Niederman, http://bob-n.com | [reply] |
| [reply] |
Here's a little idea for you. If you are just checking modified times and not actual content, you can simple do something like:
- Do a File::Find on the old directory structure
- Store the information in a hash with the filenames(with full paths) as keys and the modified date/time as the value.
- Once you finish, do a File::Find on the new directory structure
- Compare the modify date/times of the new files with what you have stored for the old files
I don't know how many files you are talking about? Thousands? Millions? Don't underestimate Perl's speed, if all you are checking is modify times then your bottleneck will be the Disk I/O not your script. And that would be the case with other solutions as well. If you are concerned about time and efficiency, you can separate the two loops above and have the resulting hash from the File::Find on the old directory structure stored using the Storable module. That way you can cache the mod times of the files before you need to compare them to the new directory structure and then just retrieve them when you run the script against the new directory.
HTH
| [reply] |
good morning.
again, all the good things have been mostly said, so here is a snippet of code that will give you, after you supply two directories, the common files, the files unique to the first directory, the files unique to the second, and those that have been modified.
to determine modification we use industrial strength (gisle aas') MD5 rather than relying on file sizes, timestamps, etc. ie: we determine any differences in file content, rather than modification time. this works well with Ken Williams' Tie::Textdir, the other module used. the approach is good for source (and even binaries, i find), but you may want to look at times, in which case... go elsewhere!
use Tie::TextDir;
use Digest::MD5;
my ($dirA, $dirB) = @ARGV;
tie my %filsA, 'Tie::TextDir', ($dirA || "."), 'rw';
tie my %filsB, 'Tie::TextDir', ($dirB || "."), 'rw';
my (%common, @uniqA, @uniqB, @modified);
map {
$common{$_} = Digest::MD5::md5_base64($filsA{$_}) if exists $filsB
+{$_}
} keys %filsA;
map { push (@uniqA, $_) if ! exists $filsB{$_} } keys %filsA;
map { push (@uniqB, $_) if ! exists $filsA{$_} } keys %filsB;
@modified = grep {
! (Digest::MD5::md5_base64($filsB{$_}) eq $common{$_} )
} keys %common;
untie %filsA; untie %filsB;
print "** common to $dirA, $dirB **\n" . join "\n", keys %common;
print "\n** unique to $dirA **\n" . join "\n", @uniqA;
print "\n** unique to $dirB **\n" . join "\n", @uniqA;
print "\n** modified files **\n" . join "\n", @modified;
print "\n** finito **\n";
which produces output like
** common to chaffwin, chaffwin2 **
TO_DO
md5.h
README
COPYING
ChangeLog
Makefile
chaffwin.el
chaffwin.o
md5.o
chaffwin.c
chaffwin.pl
md5.c
CVS
** unique to chaffwin **
chaffwin.exe
oot
** unique to chaffwin2 **
chaffwin.exe
oot
** modified files **
chaffwin.o
** finito **
hope that helps
...wufnik
-- in the world of the mules there are no rules --
| [reply] [d/l] [select] |
You may want to have a look at my diffy node. It's old code, could be done way more elegant these days, but probably it is a good starting point for your task.
Bye
PetaMem
| [reply] |
I just saw File::Dircmp on cpan.
SYNOPSIS
use File::Dircmp;
@r = dircmp($dir1, $dir2, $diff, $suppress);
DESCRIPTION
The dircmp command examines dir1 and dir2 and generates
various tabulated information about the contents of
the directories. Listings of files that are unique to each directory are generated for all the options. If no option is entered, a list is output indicating whether the file names common to both directories have the same contents.
| [reply] |
Thanks for the suggestions and keep them coming if anyone has any more. They have helped greatly. One a side note I’m working on a Unix box to do this. | [reply] |
You need to create titles.txt like this:
dir /s > titles.txt
Then you can work with the code below to put
in the file time/date stamps as you need it.
open(FD1, "<titles.txt") || die "open: $!";
#
while($line1 = <FD1>) {
if ($line1 =~ /Directory/) {
$dirval = $line1;
}
if ($line1 =~ /g1\.jpg/) {
if ($line1 !~ /Directory|<DIR>/) {
$line1 =~ /.{39}(.*)$/;
$filename = $1;
write;
}
}
}
close(FD1);
exit();
format STDOUT =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$filename, $dirval
.
D:\<snip>>perl mygrep.pl
message.jpg1.jpg Directory of S:\<snip>\images
message.jpg1.jpg Directory of S:\<snip>\images
editFunding1.jpg Directory of S:\<snip>\images
funding1.jpg Directory of S:\<snip>\images
tracking1.jpg Directory of S:\<snip>\images
viewFunding1.jpg Directory of S:\<snip>\images
g1.jpg Directory of S:\<snip>\weirdos
g1.jpg Directory of S:\<snip>\Personal
wag1.jpg Directory of S:\<snip>\Internet redesig
staff-org1.jpg Directory of S:\<snip>\cutouts
training1.jpg Directory of S:\<snip>\cutouts
winning1.jpg Directory of S:\<snip>
free_bg1.jpg Directory of S:\<snip>\PP 2000
swefbldg1.jpg Directory of S:\<snip>\<snip>grist
| [reply] |