Zenkyoki has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I've been working on a script that will use File::Find to traverse a directory tree, deleting files older than 30 days since last modified. The traversal works fine, and the deletions work to a certain extent, but it just seems like I'm missing about half of the files that need to be deleted... Here is most of the code, please provide suggestions and criticisms.
use File::Find; my @time = localtime(); $time[5] += 1900; $time[4]++; my $curmonth = $time[4]; my $today = $time[3]; my $three = $curmonth - 0; my $year = $time[5]; my @filearray; my ($dirname, $outputfile) = <@ARGV>; chomp($dirname); chomp($outputfile); print "Beginning... "; my $filecounter = 0; open OUTPUT, ">$outputfile"; print "\nFrom the Directory: ", $dirname; print OUTPUT "\nFrom the Directory: ", $dirname, "\n"; opendir DIR, $dirname; chdir($dirname); my @directories = readdir (DIR); find(\&wanted, @directories); sub wanted { -l && !-e && print "bad link: $File::Find::name\n"; my $filepass = $File::Find::name; my $sendcheck = $File::Find::name; if ($filepass eq ".."){ print "\n\n\tFrom $dirname -- Files/Directories Scanned: ",$filecounter, "\n"; close OUTPUT; exit;}else{ &CheckMods($sendcheck); } } sub CheckMods { my $file = shift; my $tempvar = $_; my @info = stat($tempvar); my $last_access = $info[8]; my $last_modification = $info[9]; $filecounter++; if (@info){ my @times = (localtime($last_modification)); $times[4]++; $times[5] += 1900; $notify =(" Date: "."$times[4]/$times[3]/$times[5]"); print "Checking folder... "; if ($times[5] < $year){ push(@filearray, $_); &delete(); }else{ if ($year = $times[5]){ if ($three > $times[4]){ push(@filearray,$file); &delete(); } } } } else { print OUTPUT "\n"; } return; } sub delete{ foreach (@filearray){ print OUTPUT "Deleting the following file: $_\n"; print "Deleting the following file: $_\n"; unlink || warn "Can not delete file $_: $!\n"; }return; } close OUTPUT;
UPDATE: When I run the script, it gives me the following message: "Cannot delete path/filename No such file or directory" It does successfully delete the appropriate files in the home directory, so it seems that it may be a traversal issue.

Replies are listed 'Best First'.
Re: Deleting Files in 2000
by Joost (Canon) on Oct 22, 2004 at 20:33 UTC
      Thanks for the suggestion, but when I changed it to return, the script simply sticks in a loop, going over the files that couldn't be deleted.
Re: Deleting Files in 2000
by olivierp (Hermit) on Oct 22, 2004 at 21:31 UTC
    Do the "skipped" files perhaps contain Unicode characters ?

    As PodMaster pointed out to me here, readdir will not return such files.
    HTH,
    --
    Olivier
Re: Deleting Files in 2000
by graff (Chancellor) on Oct 24, 2004 at 06:29 UTC
    There are a number of strange things in your code, and some are clearly mistakes. First, I've never seen this idiom before:
    my ($dirname, $outputfile) = <@ARGV>;
    That's doing "glob" expansion on the elements of @ARGV (see "perldoc -f glob"), and saving the first two elements as the $dirname and $outputfile. It would make more sense to leave the angle brackets out, then actually check the contents of @ARGV before trying to use them, and print a brief but helpful message when the args are missing or inappropriate (e.g. if the first arg isn't a directory, or there is no second arg); also, "chomp" is unnecessary on @ARGV values (it does no damage, but you don't need it here).

    You take it for granted that "$dirname" really contains the name of a directory, and you pass it to "opendir" without checking for failure. Then you take it for granted that all the entries in this directory are directories as well, and pass them all to File::Find without checking whether any of them are just data files.

    Most of your trouble is in the CheckMods sub; you don't check to see whether you're dealing with a data file or a directory (this matters), your approach to testing file age is wrong, you have a "if ( $x = $y )" where you meant to have "if ($x == $y)" (but lucky for you, this test was both pointless and harmless), and you're doing a lot of unnecessary work -- some of which is causing the error messages.

    Check out the "-M" function (under "perldoc -f -X", the "file test" operations); it returns the age of a given file in days (as a floating-point number) relative to the time when the script began to run. If the value returned by "-M" is greater than 30, it meets your stated criterion for deletion. You don't need all that localtime and stat stuff at all. (And that stuff doesn't implement your criterion anyway: as written, your script will delete a file dated June 30 if you run it on July 1.)

    Your log messages to the output file and STDOUT will show you that file names are being repeated over and over when the "delete" sub is called; this happens because for each deletable file, CheckMods pushes it onto the array, then calls "delete()", which processes the whole array. This means that on every call after the first one, the "delete" sub is trying to unlink files that were already removed in a previous call.

    Either don't loop over the array in the delete sub, or else don't call the delete sub from within the "wanted" logic (wait till the traversal is done and the array is fully populated, then go over the array just once to delete the files in the list).

    Here's how I would do it, without using an array to keep track of the files (i.e., deleting them as they are found):

    use strict; use File::Find; die "Usage: $0 startdir, logfile\n" unless ( @ARGV == 2 and -d $ARGV[0] ); open( LOG, ">$ARGV[1]" ) or die "$ARGV[1]: $!\n"; chdir $ARGV[0]; my $age_limit = 30; # maybe take this from @ARGV as well? find( \&log_and_unlink, "." ); sub log_and_unlink { # if ( -l && !-e _ ) { # strike this block ... # print "Stale link: $File::Find::name\n"; # } # elsif # ... (see update paragraph below) if ( -f && -M _ > $age_limit ) { print LOG "deleted $File::Find::name\n"; unlink; } elsif ( -f _ ) { print LOG "kept $File::Find::name (less than $age_limit days o +ld)\n"; } else { print LOG "skipping $File::Find::name (not a data file)\n"; } } # updated to fix indentation: proper indentation IS important.
    (Note that after I've done "-l" the first "-f", all other "-X" operations are given "_" (underscore) as the arg -- this queries the same stat structure that was created by "-l" the first "-f", rather than repeating the stat call on the same file over and over.)

    My version will never delete directories -- it's just safer that way. You can do a separate scan, after the old data files are gone, to remove directories that contain no data files, if that needs to be done. A directory's modification time changes when files are deleted (or created or renamed), and it's possible to modify the content of a data file without changing the mod time of the directory that contains it. So deleting directories based on their age is basically wrong.

    update: Having just done a bit more checking (on macosx), I found that the "(-l and !-e)" test does not work as expected for identifying symlinks with non-existant targets (maybe non-BSD-based systems behave differently?). If it's important for you to identify stale symlinks, you'll want to test carefully and try alternatives if necessary (e.g. "-l and !-f _ and !-d _").