Re: Scanning for duplicate files

I think size is too loose a selector for duplicate files. Coincidence is more likely than you may think for a collection of files with common format, stereotyped content, or small size. Since you want to unlink dupes, it would be advisable to play safe.

An md5 digest is a better indicator. Here is one way to use it:

my %cksums;
push @{$cksums{`md5sum $_`}}, $_ for glob($dir/*);
unlink( splice @{$cksums{$_}}, 1) || die $! for keys %cksums;
[download]

This is fairly idiomatic. The first two statements construct a hash of arrays. The arrays contain filenames duplicates, indexed by checksum. For each distinct md5 digest, we unlink the list of extra files pruned by splice.

After Compline,
Zaxo

Comment on Re: Scanning for duplicate files Download Code