argv has asked for the wisdom of the Perl Monks concerning the following question:

Of all the searches I've done on the subject, everyone has asked essentailly the basic, "how do I move/copy/read/search" a directory tree? And, as we all know, File::Find is the tool of choice.

The problem is, this module only works on one tree descent at a time; trying to instantiate two copies of the code causes incorrect results. So, this renders the following problem a tad more complicated.

The scenario is this: I have various external disks; some are a working (live) disks that may contain about 50-200G worth of high-res images. then, there's the larger, 1-terrabyte disk (raid) that has directories that mirror the "live" working disks. Essentially, it's a recovery disk that I bring out once in a while to do back-ups. For reasons beyond the scope of this discussion, it has a series of standard directory trees, each of which represents the working disks.

What I need to do is run a perl script (which will later be embedded in a much larger perl app) that updates the back-up directory tree with the changes made on the live disk. I "could" do a simplistic "cp -r" type thing and just force-write everything over, but only sporadic files are updated, and usually not that often. It'd be an overkill to do such work, for reasons that I hope don't need to be clarified.

What I want to do is something similar to what "tar" can do: just update files that have been modified (including adding new files), leaving everything else alone. But the disk is not a tar archive. I could use cygwin shell's "find" like in unix to find files that have been updated since a certain time, and just copy those files, but the need to actually check the destination disk's timestamp on the correllating file for absolute verification (including checking size) is necessary.

I wrote a find function that uses File::Find, but as noted, it can't descend two trees at the same time, and I can't instantiate two copies. So, what more is there I can do? I wrote the following program, called "duofind", which descends srcdir and reports only files that also exist in destdir (I haven't done the lstat() tests to see if the file needs updating yet) because I need to solve the more important problem: it does NOT report files that DO NOT exist in destdir, nor does it report files in destdir that do not exist in srcdir.

Here's what I have so far:

#!/usr/bin/perl -w =head1 Command: duofind srcdir destdir Descends srcdir and reports only files that also exist in destdir. It does NOT report files that DO NOT exist in destdir, nor does it report files in destdir that do not exist in srcdir. =cut use File::Find; use Cwd; sub cmp { return if -d; $file2 = $_; if ($file1 eq $file2) { print "### Found in $File::Find::dir/$_\n" + } } sub doit { my $f = shift; opendir DIR, $f || die "can't open $f"; foreach my $file (readdir DIR) { next if ($file eq "." || $file eq ".."); if (-d "$f/$file") { doit("$f/$file"); next } $file1 = $file; $src = $f; print "Looking for $src/$file1\n"; find({ wanted => \&cmp }, $dest); } close DIR; } die "need a src dir" if (!($src = shift)); die "need a dest dir" if (!($dest = shift)); die "src cannot equal dest" if ($src eq $dest); doit ($src);

Pardon the sloppy coding style.. this is just a temporary codeset for testing purposes. dan

2004-11-27 Edited by Arunbear: Changed title from 'searching and <i>modifying</i> two directory trees'; HTML in node titles is not supported

Replies are listed 'Best First'.
Re: searching and modifying two directory trees
by diotalevi (Canon) on Nov 25, 2004 at 02:27 UTC

    I'd solve this with rsync. I think its defaults even mirror the logic you requested. I use this on my Windows XP laptop to mirror my data up to my network share. I have to add options to get it to delete stuff on the target side - the default is to ignore target-side stuff that doesn't already exist on the source.

    $ rsync source_dir target_dir

    Added, minutes later.

    Alternatively, you can use an iterator form of File::Find at Re: Re: (Perl6) Groking Continuations (iterators) and then just walk each directory tree in sequence. This will allow you to write this in perl with no serious problems. Its still easier to just have rsync handle it.

      If it weren't for the fact that I need (or would like) to integrate this into a much broader package of perl tools that I'm writing, rsync would be great. Perhaps the broader solution would be modify File::Find to support new(), so that unique iterations can be instantiated on an as-needed basis.

      Also, it be nice if the wanted code could return a boolean to indicate whether the caller should continue. This could be nice for those looking to scan directories that have thousands of entries.

      BTW, where's rync for WinXP? It doesn't seem to be part of my cygwin distribution... perhaps it was an option I didn't select? darn, and it's been a while since I even installed this stuff... ok, I guess i'll go look. dan

        where's rync... perhaps it was an option I didn't select?

        Yes, you did not select it, but cygwin offers it.

        Cheers, Sören

Re: searching and modifying two directory trees
by Velaki (Chaplain) on Nov 25, 2004 at 04:24 UTC

    Hmmmm...What about collecting the datestamps and MD5 checksums of the files in a data structure; then, update only the files that are out of sync?

    Yeah, this is essentially a variation on rsync, but you be able to whip up a quick module to do it.

    Map out the sync logic carefully, and you should have no problems.

    Just a thought,
    -v
    "Perl. There is no substitute."
•Re: searching and modifying two directory trees
by merlyn (Sage) on Nov 25, 2004 at 13:23 UTC
Re: searching and modifying two directory trees
by zentara (Cardinal) on Nov 25, 2004 at 14:13 UTC