I don't know why this is taking so long.

Even on Windows XP, a directory with 10K or 20K files is no big deal. With the Windows NTFS file system there are very good reasons not to put a lot of files in the "root C:\" directory, but you aren't doing that. A sub-directory can have 50K files with no problem.

I would read the "target directory" and then code what I call an "execution plan". Moving files is a "destructive operation" because it modifies the input data. Copying files is not destructive, but takes longer.

Anyway, I would code the basic algorithm and leave the actual file moving or copying to a final step. I often code a constant like use constant ENABLE_MOVE => 0; I run the code to make sure that it is going to do what I want before I turn that variable "on".

Your code should just take some seconds to decide what to do. Take the actual move or copy out of the equation until you have an efficient algorithm. Below I just print an intention of what would happen. Get that working efficiently then "turn on" the actual file operation(s).

#!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); $|=1; #turn off buffering to stdout for debugging my %HoH; #{extension}{name} while (my $full_name = <DATA>) { next if $full_name =~ /^\./; # skip names beginning with dot my ($name, $ext) = $full_name =~ /([\w.]+)\.(\w+)$/; next unless defined $ext; # skip bare names wihout .extension $HoH{$ext}{$name}=1; } pp \%HoH; foreach my $pdf_file (keys %{$HoH{pdf}}) { if (exists $HoH{epub}{$pdf_file}) { print "do something with $pdf_file.pdf and $pdf_file.epub\n"; } } =prints { doc => { baz => 1 }, epub => { bar => 1, baz => 1, boo => 1 }, pdf => { baz => 1 }, txt => { "baz" => 1, "boo" => 1, "some.long.name" => 1 }, } do something with baz.pdf and baz.epub =cut __DATA__ . .. some.long.name.txt baz.txt baz.epub baz.doc baz.pdf bar.epub boo.epub boo.txt barefile

In reply to Re: Duplicates in Directories by Marshall
in thread Duplicates in Directories by kel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.