shadowfox has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I just wanted a few eyes on a script I'm using next week to move millions of files to folders based on their last modified date's year and month.

I already built a working script which for me is already an acomplishment lol but I wanted to see if there was any better/faster way to acomplish the same task. Since I litterly will be moving millions of files I'd like to make the extra effort to make sure its working as efficient as I can.

Its very basic, but I commented things to make sure there is no confusion for anyone else reading it.

# Move files into new sub-folders based on their last modified date. # A file modified in Jun 2011 goes into $indir\_2011\6\ # A file modified in Mar 1998 goes into $indir\_1998\3\ use File::Copy ; # Using for the move(); function use File::Path qw(mkpath) ; # Using for the mkpath(); function use Warnings ; # Cause we should $indir = "C:\\Archive\\Program1\\Backup\\" ; # Script will start here +looking for matching files $total = 0 ; # Set incremental file counter to see how many files get +processed each run. chdir($indir) ; # Move from current working directory to user defined +directory. opendir(DIR, $indir) or die $! ; # Open $indir if we can, or fail repo +rting an error while ($match = readdir(DIR)) { # While were in $indir with a matching + file continue. if ($match =~ /\.txt|.pgp$/i) { # Match files read to .pgp or .txt + ignoring others $newlocate = $outdir . $match ; # Set where we want to move th +e files to $movefile = $indir . $match ; # Set a matching file with its +current directory # Build variables from the matching file we're working on with + Stat ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, +$mtime, $ctime, $blksize, $blocks) = stat($movefile) ; # Build variables from the modified date of our matching file +using localtime ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) += localtime($mtime) ; $mday = sprintf('%02d', $mday) ; # Not used in this script, b +ut returns a 2 digit day $mon = sprintf('%01d', ++$mon) ; # Returns a 1 digit month un +til we get to 10 then use 2, not my choice. $year = 1900 + $year ; # Returns a 4 digit year $outdir = "_" . $year . "\\" . $mon . "\\" ; # Build the direc +tory name based on date where the file is going \_2011\9\ unless (-d $outdir) { # Doesn't seem to work right, tries to c +reate dir even if already exists mkpath($outdir, 1) ; # Create new directory for year with +applicable subdir for month of matching file. } move($movefile, $newlocate) ; # Move matching file from its cu +rrent location to its new date specific home. print "Moved " . $match . " to " . $outdir . "\n"; # Moved Fil +ename.TXT to $indir\_YYYY\M\ or $indir\_YYYY\MM\ $total++ ; # Increase file counter starting with 0 each time a + file is moved. } ## end if ($match =~ /\.txt|.pgp$/i) } ## end while ($match = readdir(DIR...)) print $total. " files moved.\n" ; # Prints our total count, ie: 103050 +9 files moved. closedir(DIR) ; # Were done, close DIR handle on $indir exit 0 ; # Exit gracefully.

Replies are listed 'Best First'.
Re: Moving files to subfolders based on their last modified date
by jwkrahn (Abbot) on Aug 19, 2011 at 20:22 UTC
    use File::Copy ; # Using for the move(); function

    Or you could just use perl's built-in rename function.



    use Warnings ; # Cause we should

    That should be:

    use warnings; use strict;


    chdir($indir) ; # Move from current working directory to user defined +directory.

    You should verify that chdir worked correctly:

    # Move from current working directory to user defined directory. chdir $indir or die "Cannot chdir to '$indir' because: $!";


    if ($match =~ /\.txt|.pgp$/i) { # Match files read to .pgp or .txt + ignoring others

    You are saying match the string '.txt' anywhere in the file name OR match any character followed by the string 'pgp' only at the end of the file name.    It looks like you want:

    if ( $match =~ /\.(?:txt|pgp)$/i ) { # Match files read to .pgp or + .txt ignoring others


    $newlocate = $outdir . $match ; # Set where we want to move th +e files to

    You don't define $outdir until later in the loop so this won't work very well.



    $mday = sprintf('%02d', $mday) ; # Not used in this script, b +ut returns a 2 digit day $mon = sprintf('%01d', ++$mon) ; # Returns a 1 digit month un +til we get to 10 then use 2, not my choice. $year = 1900 + $year ; # Returns a 4 digit year $outdir = "_" . $year . "\\" . $mon . "\\" ; # Build the direc +tory name based on date where the file is going \_2011\9\

    If you are not using $mday then why are you defining it?    More simply written as:

    $outdir = "_" . ($year+1900) . "\\" . ($mon+1) . "\\" ; # Buil +d the directory name based on date where the file is going \_2011\9\


    $total++ ; # Increase file counter starting with 0 each time a + file is moved.

    Since you don't verify that move performed correctly how do you know that this number is correct?

      $mday = sprintf('%02d', $mday) ; # Not used in this script, but ret +urns a 2 digit day $mon = sprintf('%01d', ++$mon) ; # Returns a 1 digit month un +til we get to 10 then use 2, not my choice. $year = 1900 + $year ; # Returns a 4 digit year $outdir = "_" . $year . "\\" . $mon . "\\" ; # Build the direc +tory name based on date where the file is going \_2011\9\
      Even simpler is (which replaces some of the code above that also):
      my $ym = Time::Piece->new($mtime)->strftime("%Y\\%m"); # For single digit month: $ym =~ s/0(\d)$/$1/; my $outdir = "_$ym\\";
      Update: Although in the OP I now notice that a single digit month (where possible) is required...so a further tweak would be required to this (updated again to reflect this).

      All excellent points, only issue I see is one of your suggestions was written wrong. You turned basic my arithmetic operation into a variable with an assignment operator without adding a = sign

      $outdir = "_" . ($year+1900) . "\\" . ($mon+1) . "\\" ;

      For a file modified today, Aug 26 2011 that would return

      $year = 111
      $mon = 7
      $outdir = "_" . ($year+=1900) . "\\" . ($mon+=1) . "\\" ;

      That will return what were actually looking for, tho I typically just use the $mon++ anyway for that and they both give.

      $year = 2011
      $mon = 8

      In any case, thank you all for the suggestions I clearly overlooked some important issues too though oddly enough everything still worked ok i'll rewrite it for simplicity.

        one of your suggestions was written wrong

        No, it is not written wrong.    Did you actually try it?

        $ perl -le' $year = 111; $mon = 7; $outdir = "_" . ($year+=1900) . "\\" . ($mon+=1) . "\\"; print $outdir; ' _2011\8\ $ perl -le' $year = 111; $mon = 7; $outdir = "_" . ($year+1900) . "\\" . ($mon+1) . "\\"; print $outdir; ' _2011\8\

        It produces the result that you require.

Re: Moving files to subfolders based on their last modified date
by toolic (Bishop) on Aug 19, 2011 at 19:48 UTC
    In general, Perl is case-sensitive. If you want to use warnings, you should change:
    use Warnings ; # Cause we should
    to:
    use warnings ; # Cause we should
    to make sure there is no confusion for anyone else reading it
    Since you are concerned about such things, when I compile your code, I get all these warnings:
    perl -c 921312.pl Name "main::hour" used only once: possible typo at 921312.pl line 26. Name "main::blocks" used only once: possible typo at 921312.pl line 23 +. Name "main::mode" used only once: possible typo at 921312.pl line 23. Name "main::wday" used only once: possible typo at 921312.pl line 26. Name "main::uid" used only once: possible typo at 921312.pl line 23. Name "main::dev" used only once: possible typo at 921312.pl line 23. Name "main::isdst" used only once: possible typo at 921312.pl line 26. Name "main::atime" used only once: possible typo at 921312.pl line 23. Name "main::gid" used only once: possible typo at 921312.pl line 23. Name "main::si921312.ple" used only once: possible typo at 921312.pl l +ine 23. Name "main::blksi921312.ple" used only once: possible typo at 921312.p +l line 23. Name "main::ctime" used only once: possible typo at 921312.pl line 23. Name "main::rdev" used only once: possible typo at 921312.pl line 23. Name "main::nlink" used only once: possible typo at 921312.pl line 23. Name "main::yday" used only once: possible typo at 921312.pl line 26. Name "main::ino" used only once: possible typo at 921312.pl line 23. Name "main::min" used only once: possible typo at 921312.pl line 26. Name "main::sec" used only once: possible typo at 921312.pl line 26.
    You could simplify your code by removing all unused code.

    See also use strict and warnings.

Re: Moving files to subfolders based on their last modified date
by GrandFather (Saint) on Aug 19, 2011 at 23:03 UTC

    Always use strictures (use strict; use warnings; - see The strictures, according to Seuss). Your major logic flaw (using $outdir before it is assigned a value) would have been caught by strictures.

    Note that the $outdir flaw is more interesting than you might guess. Because $outdir is a package variable, and therefore a global variable, it remembers the last file's directory and uses it for the current one so most files will go to the wrong place!

    True laziness is hard work