search for duplicated via IPTC field (OS X)

goppy has asked for the wisdom of the Perl Monks concerning the following question:

I am looking to filter a folder of images by their IPTC value, this way I can find the duplicates which riddle my collection, sometimes five versions of a single image exist, but all with different file names. The key IPTC field is the "Headline" (or "IPTC:By-line"), this has the duplicate value, when found, the images are shuttled to a separate folder from the main collection.

The following code works in unison with a program called ExifTool (http://www.sno.phy.queensu.ca/~phil/exiftool/), that is how the code is able to interrogate the IPTC data:

#!/usr/bin/perl -w 
use strict; 
BEGIN { unshift @INC, '/usr/bin/lib' } 
use Image::ExifTool; 
print "Using ExifTool version $Image::ExifTool::VERSION\n"; 
 
@ARGV > 2 or die "Syntax: script TAG DIR FILE [FILE...]\n"; 
 
my $tag = shift; 
my $dstdir = shift; 
 
my $exifTool = new Image::ExifTool; 
my $moved = 0; 
my ($file, %foundValue); 
foreach $file (@ARGV) { 
    my $info = $exifTool->ImageInfo($file, $tag); 
    unless (%$info) { 
        warn "$tag not found in $file\n"; 
        next; 
    } 
    my ($val) = values %$info; 
    if ($foundValue{$val}) { 
        # duplicate value, so move to destination directory 
        print "$tag is the same in $file as $foundValue{$val}\n"; 
        my $dst = $file; 
        $dst =~ s{.*/}{};  # remove directory name 
        $dst = "$dstdir/$dst"; 
        if (-e $dst) { 
            warn "$dst already exists!\n"; 
        } elsif (not rename($file, $dst)) { 
            warn "Error moving $file to $dst\n"; 
        } else { 
            print "  --> moved\n"; 
            ++$moved; 
        } 
    } else { 
        $foundValue{$val} = $file;  # save first file with this value 
    } 
} 
printf "%5d files processed\n", scalar(@ARGV); 
printf "%5d files moved\n", $moved; 
# end
[download]

However there would be many to sort even still, so I was looking to have this code modified in such a way as to have the groups of images (all the ones with "Henry Ford Clinic") in the Headline field placed in a single folder and that folder named by the value of the IPTC field data, so the folder would be called "Henry Ford Clinic".

On top of which, some images are the same but on the end they have -framed in mahogany, of which these images have a faux framing but essentially are the same image. For this I thought then if I could restrict the data compared to say the first "15" chars then I would catch all the duplicates indeed. This also has not been programed in.

Comment on search for duplicated via IPTC field (OS X) Download Code

Replies are listed 'Best First'.
Re: search for duplicated via IPTC field (OS X) by Utilitarian (Vicar) on Aug 09, 2010 at 19:32 UTC
So rather than just moving to the destination directory you want to create sub directories in the destination directory based on n initial characters mkdir is the command you need for that. There are many ways to select the first fifteen characters, but substr is probably the most useful in this case Have a go at that and let us know how you get on. `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l]
Re^2: search for duplicated via IPTC field (OS X) by goppy (Initiate) on Aug 09, 2010 at 23:12 UTC
So this would go from char 0 to the 15th? `my $oneName = substr($names, 0, 15);` [download]	[reply] [d/l]
Re^3: search for duplicated via IPTC field (OS X) by Utilitarian (Vicar) on Aug 10, 2010 at 07:09 UTC
It doesn't hurt to try it out yourself, in fact you may discover new things and waste a whole afternoon explaining the behaviour you observe ;) `$ perl -e 'my $names="12345678901234567890"; my $oneName = substr($na +mes, 0, 15); print "$oneName\n";' 123456789012345` [download] So yes, that appears to do what you want. `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re^4: search for duplicated via IPTC field (OS X) by goppy (Initiate) on Aug 10, 2010 at 16:46 UTC
Re^5: search for duplicated via IPTC field (OS X) by Utilitarian (Vicar) on Aug 10, 2010 at 20:19 UTC
Re: search for duplicated via IPTC field (OS X) by BrimBorium (Friar) on Aug 09, 2010 at 20:22 UTC
Asking specific questions will be better than posting same text at bytes.com and see what will happen... You have posted some code, you have some design in mind, that's all the rigt way. If you get stuck or something does not work like you expected you will surely get some advices here. But it will be easier to answer "how can I move a file?" than "any comments on my code?" see How do I post a question effectively?	[reply]