If I'm reading your description correctly, you are passing over the list of files two or three times (possibly doing steps 1 & 2 in one pass). Step 3 also does a linear search over the file names, which is going to go quadratic in terms of processing.
Maybe try to get all the info in one pass, store the details in a hash structure, and then iterate over that hash. That will take care of the need for exact matches and avoid linear searches.
That code prints:use strict; use warnings; # I assume the actual code will obtain this list using # glob or similar my @allfiles = qw / baz.txt baz.epub baz.doc baz.pdf bar.epub boo.epub boo.txt /; my %by_ext; # %by_fbase is actually redundant below, # but is maybe useful for other things # so I have left it in my %by_fbase; foreach my $file (@allfiles) { # should use a proper filename parser here # like File::Basename, but a split will serve # for the purposes of an example. my ($name, $ext) = split /\./, $file; $by_ext{$ext}{$name} = $file; $by_fbase{$name}{$ext} = $file; } foreach my $name (keys %by_fbase) { #print "$name\n"; no autovivification; # could use exists in this check if you want to avoid autoviv, # but file names should evaluate to true if they have # an extension, even if the name part evaluates to false if ($by_fbase{$name}{pdf} && $by_fbase{$name}{epub}) { # do stuff print "$name has epub and pdf extensions: " . "$by_fbase{$name}{epub} $by_fbase{$name}{pdf}\n"; # now do stuff like moving files since you can iterate over # the values of the relevant subhash foreach my $file (values %{$by_fbase{$name}}) { print "now do something to $file\n"; } } }
baz has epub and pdf extensions: baz.epub baz.pdf now do something to baz.epub now do something to baz.doc now do something to baz.txt now do something to baz.pdf
Update: Edited incomplete comment starting with "now do stuff"
In reply to Re: Duplicates in Directories
by swl
in thread Duplicates in Directories
by kel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |