in reply to Duplicates in Directories
Even on Windows XP, a directory with 10K or 20K files is no big deal. With the Windows NTFS file system there are very good reasons not to put a lot of files in the "root C:\" directory, but you aren't doing that. A sub-directory can have 50K files with no problem.
I would read the "target directory" and then code what I call an "execution plan". Moving files is a "destructive operation" because it modifies the input data. Copying files is not destructive, but takes longer.
Anyway, I would code the basic algorithm and leave the actual file moving or copying to a final step. I often code a constant like use constant ENABLE_MOVE => 0; I run the code to make sure that it is going to do what I want before I turn that variable "on".
Your code should just take some seconds to decide what to do. Take the actual move or copy out of the equation until you have an efficient algorithm. Below I just print an intention of what would happen. Get that working efficiently then "turn on" the actual file operation(s).
#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); $|=1; #turn off buffering to stdout for debugging my %HoH; #{extension}{name} while (my $full_name = <DATA>) { next if $full_name =~ /^\./; # skip names beginning with dot my ($name, $ext) = $full_name =~ /([\w.]+)\.(\w+)$/; next unless defined $ext; # skip bare names wihout .extension $HoH{$ext}{$name}=1; } pp \%HoH; foreach my $pdf_file (keys %{$HoH{pdf}}) { if (exists $HoH{epub}{$pdf_file}) { print "do something with $pdf_file.pdf and $pdf_file.epub\n"; } } =prints { doc => { baz => 1 }, epub => { bar => 1, baz => 1, boo => 1 }, pdf => { baz => 1 }, txt => { "baz" => 1, "boo" => 1, "some.long.name" => 1 }, } do something with baz.pdf and baz.epub =cut __DATA__ . .. some.long.name.txt baz.txt baz.epub baz.doc baz.pdf bar.epub boo.epub boo.txt barefile
|
|---|