I am looking to filter a folder of images by their IPTC value, this way I can find the duplicates which riddle my collection, sometimes five versions of a single image exist, but all with different file names. The key IPTC field is the "Headline" (or "IPTC:By-line"), this has the duplicate value, when found, the images are shuttled to a separate folder from the main collection.

The following code works in unison with a program called ExifTool (http://www.sno.phy.queensu.ca/~phil/exiftool/), that is how the code is able to interrogate the IPTC data:

#!/usr/bin/perl -w use strict; BEGIN { unshift @INC, '/usr/bin/lib' } use Image::ExifTool; print "Using ExifTool version $Image::ExifTool::VERSION\n"; @ARGV > 2 or die "Syntax: script TAG DIR FILE [FILE...]\n"; my $tag = shift; my $dstdir = shift; my $exifTool = new Image::ExifTool; my $moved = 0; my ($file, %foundValue); foreach $file (@ARGV) { my $info = $exifTool->ImageInfo($file, $tag); unless (%$info) { warn "$tag not found in $file\n"; next; } my ($val) = values %$info; if ($foundValue{$val}) { # duplicate value, so move to destination directory print "$tag is the same in $file as $foundValue{$val}\n"; my $dst = $file; $dst =~ s{.*/}{}; # remove directory name $dst = "$dstdir/$dst"; if (-e $dst) { warn "$dst already exists!\n"; } elsif (not rename($file, $dst)) { warn "Error moving $file to $dst\n"; } else { print " --> moved\n"; ++$moved; } } else { $foundValue{$val} = $file; # save first file with this value } } printf "%5d files processed\n", scalar(@ARGV); printf "%5d files moved\n", $moved; # end

However there would be many to sort even still, so I was looking to have this code modified in such a way as to have the groups of images (all the ones with "Henry Ford Clinic") in the Headline field placed in a single folder and that folder named by the value of the IPTC field data, so the folder would be called "Henry Ford Clinic".

On top of which, some images are the same but on the end they have -framed in mahogany, of which these images have a faux framing but essentially are the same image. For this I thought then if I could restrict the data compared to say the first "15" chars then I would catch all the duplicates indeed. This also has not been programed in.


In reply to search for duplicated via IPTC field (OS X) by goppy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.