If I'm reading your description correctly, you are passing over the list of files two or three times (possibly doing steps 1 & 2 in one pass). Step 3 also does a linear search over the file names, which is going to go quadratic in terms of processing.

Maybe try to get all the info in one pass, store the details in a hash structure, and then iterate over that hash. That will take care of the need for exact matches and avoid linear searches.

use strict; use warnings; # I assume the actual code will obtain this list using # glob or similar my @allfiles = qw / baz.txt baz.epub baz.doc baz.pdf bar.epub boo.epub boo.txt /; my %by_ext; # %by_fbase is actually redundant below, # but is maybe useful for other things # so I have left it in my %by_fbase; foreach my $file (@allfiles) { # should use a proper filename parser here # like File::Basename, but a split will serve # for the purposes of an example. my ($name, $ext) = split /\./, $file; $by_ext{$ext}{$name} = $file; $by_fbase{$name}{$ext} = $file; } foreach my $name (keys %by_fbase) { #print "$name\n"; no autovivification; # could use exists in this check if you want to avoid autoviv, # but file names should evaluate to true if they have # an extension, even if the name part evaluates to false if ($by_fbase{$name}{pdf} && $by_fbase{$name}{epub}) { # do stuff print "$name has epub and pdf extensions: " . "$by_fbase{$name}{epub} $by_fbase{$name}{pdf}\n"; # now do stuff like moving files since you can iterate over # the values of the relevant subhash foreach my $file (values %{$by_fbase{$name}}) { print "now do something to $file\n"; } } }
That code prints:
baz has epub and pdf extensions: baz.epub baz.pdf now do something to baz.epub now do something to baz.doc now do something to baz.txt now do something to baz.pdf

Update: Edited incomplete comment starting with "now do stuff"


In reply to Re: Duplicates in Directories by swl
in thread Duplicates in Directories by kel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.