comment on

If I'm reading your description correctly, you are passing over the list of files two or three times (possibly doing steps 1 & 2 in one pass). Step 3 also does a linear search over the file names, which is going to go quadratic in terms of processing.

Maybe try to get all the info in one pass, store the details in a hash structure, and then iterate over that hash. That will take care of the need for exact matches and avoid linear searches.

use strict;
use warnings;

#  I assume the actual code will obtain this list using
#  glob or similar
my @allfiles = qw /
baz.txt
baz.epub
baz.doc
baz.pdf
bar.epub
boo.epub
boo.txt
/;

my %by_ext;
#  %by_fbase is actually redundant below,
#  but is maybe useful for other things
#  so I have left it in
my %by_fbase;  

foreach my $file (@allfiles) {
    #  should use a proper filename parser here
    #  like File::Basename, but a split will serve
    #  for the purposes of an example.
    my ($name, $ext) = split /\./, $file;
    
    $by_ext{$ext}{$name} = $file;
    $by_fbase{$name}{$ext} = $file;
}

foreach my $name (keys %by_fbase) {
    #print "$name\n";
    no autovivification;
    #  could use exists in this check if you want to avoid autoviv,
    #  but file names should evaluate to true if they have
    #  an extension, even if the name part evaluates to false
    if ($by_fbase{$name}{pdf} && $by_fbase{$name}{epub}) {
        #  do stuff
        print "$name has epub and pdf extensions: "
          . "$by_fbase{$name}{epub} $by_fbase{$name}{pdf}\n";
        #  now do stuff like moving files since you can iterate over
        #  the values of the relevant subhash
        foreach my $file (values %{$by_fbase{$name}}) {
            print "now do something to $file\n";
        }
        
    }
}
[download]

That code prints:

baz has epub and pdf extensions: baz.epub baz.pdf
now do something to baz.epub
now do something to baz.doc
now do something to baz.txt
now do something to baz.pdf
[download]

Update: Edited incomplete comment starting with "now do stuff"

In reply to Re: Duplicates in Directories by swl
in thread Duplicates in Directories by kel

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.