Hi Amoe.

This isnt a direct answer to your question but I have had this problem before and wrote the following to sort it out.

It uses MD5 signatures to find duplicates regardless of the filename and size or time. Its pretty fast as well.

Oh, its a bit primitive, sorry, I wrote it soon after I started learning perl.

use warnings; use strict; use Digest::MD5; use File::Find; $|=1; #Autoflush ON! my @list; my %dupes; my @delete; my %digests; my $ctx = Digest::MD5->new; sub check_file { my $file=shift; $ctx->reset; open FILE,$file || die "Cant open $file!\n"; binmode FILE; $ctx->addfile(*FILE); close FILE; my $digest = $ctx->hexdigest; if (exists($digests{$digest})) { print "\t$file is a dupe!\n"; $dupes{$digest}->{$file}=1; push @delete,$file; } else { $digests{$digest}=$file; } } #CHANGE ME!!! my $path='D:/Development/Perl/'; print "I am going to look for duplicates starting at ".$path."\n"; find({wanted=>sub{if (-f $_) {check_file($_)} else {print "Searching $_\n"}}, no_chdir=>1},$path); print "There are ".@delete." duplicate files to delete.\n"; # Uncomment the below line to lose the duplicates! # print "Deleted ".unlink(@delete)." files!";

Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)


In reply to Re: Scanning for duplicate files by demerphq
in thread Scanning for duplicate files by Amoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.