in reply to Comparing duplicate pictures in different directories

If I understand your code right you are only looking for files that not only have the same content, but also the same name. I don't think that's too common, though YMMV.

Here's my version

use Cwd; use File::Find; use Win32::Process; use Digest::MD5; use DB_File; use FileHandle; $cwd = cwd(); print "In directory $cwd\n"; sub O_CREAT { 256 } sub O_RDWR { 2 } tie %files, DB_File, "duplicity_db", O_CREAT|O_RDWR, 0700; undef %files; $md5 = Digest::MD5->new; if (lc $ARGV[0] eq '-autodel') { $autodel = 1; $onlyautodel = 0; shift @ARGV; } elsif (lc $ARGV[0] eq '-autodelonly') { $autodel = 1; $onlyautodel = 1; shift @ARGV; } else { $autodel = 0; $onlyautodel = 0; } $viewer = shift @ARGV; @basedir = @ARGV; die "Ussage: duplicity viewer basedir1 basedir2 ..." unless $viewer and @basedir; if (open DUP,"<__keep_duplicity.log") { while (<DUP>) { /^"(.*?)"\s*=\s*"(.*)"/ and $duplicity{$1} = $2; } close DUP; open DUP,">>__keep_duplicity.log"; } else { print "No __keep_duplicity.log found, assuming empty assumptions : +-)\n"; open DUP,">__keep_duplicity.log"; } DUP->autoflush(); print DUP "\n"; # just to make sure the items do not get joined open DUPLOG,">__duplicity.log"; DUPLOG->autoflush(); print DUPLOG "\n"; # just to make sure the items do not get joined sub getHash { my $file = shift; my $FH; open $FH, $file or return; binmode $FH; $md5->reset(); $md5->addfile($FH); close $FH; return $md5->digest(); } sub file { return if -d $_; my ($hash,$duplicate); my $file = $_; $hash = getHash($file); if ($duplicate = $files{$hash} and !($duplicity{"$File::Find::dir\\$file"} eq $duplicate or $duplicity{$duplicate} eq "$File::Find::dir\\$file" ) ) { print qq{"$duplicate" = "$File::Find::dir\\$file"\n}; if (!$autodel or !autodelete("$cwd\\$File::Find::dir\\$file", +"$cwd\\$duplicate")) { Win32::Process::Create($proc1,$viewer, qq{$viewer "$cwd\\$File::Find::dir\\$file"}, 0, DETACHED_PROCESS, cwd ) or die "ERROR: $^E\n"; sleep 1; Win32::Process::Create($proc2,$viewer, qq{$viewer "$cwd\\$duplicate"}, 0, DETACHED_PROCESS, cwd ) or die "ERROR: $^E\n"; $proc1->Wait(INFINITE); $proc2->Wait(INFINITE); } if (! $onlyautodel) { if (-e "$cwd\\$File::Find::dir\\$file") { if (-e $duplicate) { print DUP qq{"$duplicate" = "$File::Find::dir\\$fi +le"\n}; } else { $files{$hash}="$File::Find::dir\\$file"; print DUPLOG qq{"$duplicate" = "$File::Find::dir\\ +$file"\n}; } } else { if (-e "$cwd\\$duplicate") { print DUPLOG qq{"$File::Find::dir\\$file" = "$dupl +icate"\n}; } else { print "\tBoth deleted\n"; } } } } else { $files{$hash} = "$File::Find::dir\\$file"; } } find(\&file, @basedir); sub autodelete { my ($file1,$file2) = @_; if ($file2 =~ /-\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($file1 =~ /-\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } $file1 =~ m{[/\\]([^/\\]*?)$}; my $basename1 = $1; $file2 =~ m{[/\\]([^/\\]*?)$}; my $basename2 = $1; $basename1 =~ s/-?-\d*(\.\w+)$/$1/; $basename2 =~ s/-?-\d*(\.\w+)$/$1/; if (lc ($basename1) eq lc ($basename2)) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($basename2 =~ /^\d+\.\w+$/) { print "\tautodeleted $file2\n"; unlink $file2; return 1; } if ($basename1 =~ /^\d+\.\w+$/) { print "\tautodeleted $file1\n"; unlink $file1; return 1; } if ($onlyautodel) { print "\tuser has to decide\n"; return 1; } }
It scans several directories (with subdirectories), creates the MD5 hashes of the files, stores them into a hash and reports duplicities. With certain parameters it even automaticaly deletes some duplicates. The duplicate images are opened in an image viewer so that I can choose which one to delete based on the name and path. It's Windows only, but I think it would be no big deal to change that to Unix only. It's just that I need to create two processes and wait till they both exit. Which AFAIK has to be implemented differently on the two OSes.

Jenda
XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.