If I understand your code right you are only looking for files that not only have the same content, but also the same name. I don't think that's too common, though YMMV.

Here's my version

use Cwd; use File::Find; use Win32::Process; use Digest::MD5; use DB_File; use FileHandle; $cwd = cwd(); print "In directory $cwd\n"; sub O_CREAT { 256 } sub O_RDWR { 2 } tie %files, DB_File, "duplicity_db", O_CREAT|O_RDWR, 0700; undef %files; $md5 = Digest::MD5->new; if (lc $ARGV[0] eq '-autodel') { $autodel = 1; $onlyautodel = 0; shift @ARGV; } elsif (lc $ARGV[0] eq '-autodelonly') { $autodel = 1; $onlyautodel = 1; shift @ARGV; } else { $autodel = 0; $onlyautodel = 0; } $viewer = shift @ARGV; @basedir = @ARGV; die "Ussage: duplicity viewer basedir1 basedir2 ..." unless $viewer and @basedir; if (open DUP,"<__keep_duplicity.log") { while (<DUP>) { /^"(.*?)"\s*=\s*"(.*)"/ and $duplicity{$1} = $2; } close DUP; open DUP,">>__keep_duplicity.log"; } else { print "No __keep_duplicity.log found, assuming empty assumptions : +-)\n"; open DUP,">__keep_duplicity.log"; } DUP->autoflush(); print DUP "\n"; # just to make sure the items do not get joined open DUPLOG,">__duplicity.log"; DUPLOG->autoflush(); print DUPLOG "\n"; # just to make sure the items do not get joined sub getHash { my $file = shift; my $FH; open $FH, $file or return; binmode $FH; $md5->reset(); $md5->addfile($FH); close $FH; return $md5->digest(); } sub file { return if -d $_; my ($hash,$duplicate); my $file = $_; $hash = getHash($file); if ($duplicate = $files{$hash} and !($duplicity{"$File::Find::dir\\$file"} eq $duplicate or $duplicity{$duplicate} eq "$File::Find::dir\\$file" ) ) { print qq{"$duplicate" = "$File::Find::dir\\$file"\n}; if (!$autodel or !autodelete("$cwd\\$File::Find::dir\\$file", +"$cwd\\$duplicate")) { Win32::Process::Create($proc1,$viewer, qq{$viewer "$cwd\\$File::Find::dir\\$file"}, 0, DETACHED_PROCESS, cwd ) or die "ERROR: $^E\n"; sleep 1; Win32::Process::Create($proc2,$viewer, qq{$viewer "$cwd\\$duplicate"}, 0, DETACHED_PROCESS, cwd ) or die "ERROR: $^E\n"; $proc1->Wait(INFINITE); $proc2->Wait(INFINITE); } if (! $onlyautodel) { if (-e "$cwd\\$File::Find::dir\\$file") { if (-e $duplicate) { print DUP qq{"$duplicate" = "$File::Find::dir\\$fi +le"\n}; } else { $files{$hash}="$File::Find::dir\\$file"; print DUPLOG qq{"$duplicate" = "$File::Find::dir\\ +$file"\n}; } } else { if (-e "$cwd\\$duplicate") { print DUPLOG qq{"$File::Find::dir\\$file" = "$dupl +icate"\n}; } else { print "\tBoth deleted\n"; } } } } else { $files{$hash} = "$File::Find::dir\\$file"; } } find(\&file, @basedir); sub autodelete { my ($file1,$file2) = @_; if ($file2 =~ /-\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($file1 =~ /-\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } $file1 =~ m{[/\\]([^/\\]*?)$}; my $basename1 = $1; $file2 =~ m{[/\\]([^/\\]*?)$}; my $basename2 = $1; $basename1 =~ s/-?-\d*(\.\w+)$/$1/; $basename2 =~ s/-?-\d*(\.\w+)$/$1/; if (lc ($basename1) eq lc ($basename2)) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($basename2 =~ /^\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($basename1 =~ /^\d+\.\w+$/) { print "\tautodeleted $file1\n"; unlink $file1; return 1; } if ($onlyautodel) { print "\tuser has to decide\n"; return 1; } }
It scans several directories (with subdirectories), creates the MD5 hashes of the files, stores them into a hash and reports duplicities. With certain parameters it even automaticaly deletes some duplicates. The duplicate images are opened in an image viewer so that I can choose which one to delete based on the name and path. It's Windows only, but I think it would be no big deal to change that to Unix only. It's just that I need to create two processes and wait till they both exit. Which AFAIK has to be implemented differently on the two OSes.

Jenda
XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.


In reply to Re: Comparing duplicate pictures in different directories by Jenda
in thread Comparing duplicate pictures in different directories by cajun

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.