hiradhu has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to optimize my project's file server. the folder structure is like this
--- Parent A |---Sub B (Sub folder of Parent A) |------Sub C (Sub folder of Sub B) |--- File A (File within Sub C) |--- File B (File within Sub C) |------Sub D (Sub folder of Sub B) | --- File A (File within Sub D) | --- File C (File within Sub D) -- Parent B ....likewise many parent folders
Over the time the file server has huge number of files and duplicates.. Here Sub C ->File A and Sub D --- File A are same to some extent but one of the them is newer. I'm trying to consolidate it as


1. If I decide to retain only Folder Sub C

a. Check for duplicate files (like File A) in both Sub D & Sub C.

b. Check which File A (other duplicate files too) is newer comparing Sub D & Sub C

c. Copy the newer file to Sub C

d. Copy rest of the files to Sub C from Sub D
My Problems are:
1. Comparing time stamps - use of stat() ..dont know how to use "WIN32_SLOPPY_STAT"

2. Recursive search which one to use - grep?

- Use of XCopy?!!
I found File::Xcopy update|UD - copy files only if 1) the file exists in the target and 2) newer in time stamp http://search.cpan.org/~geotiger/File-Xcopy-0.12/Xcopy.pm#xmove($from,_$to,_$pat,_$par) Any help pls...?!!!!!

Replies are listed 'Best First'.
Re: Retaining the most recent file in a FS
by Fletch (Bishop) on Dec 22, 2008 at 13:48 UTC

    Sounds like you're trying to kludge something together rather than just biting the bullet and using a proper VCS system (e.g. git, mercurial, svn).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Retaining the most recent file in a FS
by Corion (Patriarch) on Dec 22, 2008 at 12:50 UTC

    WIN32_SLOPPY_STAT is an optimization which does not come into play in your case.

    Use File::Find to recursively find files.

    Use File::Copy to copy files and File::Path to create directories.

    Where do you have problems exactly? What code have you written already?

Re: Retaining the most recent file in a FS
by bruno (Friar) on Dec 22, 2008 at 13:51 UTC
    To find duplicate files, you can use the module File::Find::Duplicates. Here's a script that uses it, it could serve you as an example or you can use it as-is).

    Alternatively, you can use an external application, like fdupes (it's in Debian repositories).