Re: Scanning for duplicate files

Hi Amoe.

This isnt a direct answer to your question but I have had this problem before and wrote the following to sort it out.

It uses MD5 signatures to find duplicates regardless of the filename and size or time. Its pretty fast as well.

Oh, its a bit primitive, sorry, I wrote it soon after I started learning perl.

use warnings;
use strict;
use Digest::MD5;
use File::Find;

$|=1; #Autoflush ON!

my @list;
my %dupes;
my @delete;
my %digests;
my $ctx = Digest::MD5->new;

sub check_file {
    my $file=shift;

    $ctx->reset;
    open FILE,$file || die "Cant open $file!\n";
    binmode FILE;
    $ctx->addfile(*FILE);
    close FILE;
    my $digest = $ctx->hexdigest;

    if (exists($digests{$digest})) {
        print "\t$file is a dupe!\n";
        $dupes{$digest}->{$file}=1;
        push @delete,$file;
    } else {
        $digests{$digest}=$file;
    }
}

#CHANGE ME!!!
my $path='D:/Development/Perl/';

print "I am going to look for duplicates starting at ".$path."\n";
find({wanted=>sub{if (-f $_) {check_file($_)} 
                  else {print "Searching $_\n"}},
      no_chdir=>1},$path);

print "There are ".@delete." duplicate files to delete.\n";

# Uncomment the below line to lose the duplicates!
# print "Deleted ".unlink(@delete)." files!";
[download]

Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Comment on Re: Scanning for duplicate files Download Code