How do I find and delete files based on age?

macvsog has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do I find and delete files based on age? by davorg (Chancellor) on Feb 26, 2007 at 14:58 UTC
(Please use a more descriptive title for your questions) In pseudocode, your solution would be something like this: Get date two weeks ago (using time and some basic arithmatic to subtract 14 day's worth of seconds) Open directory handle with opendir For each file read from the directory handle with readdir... Ignore file unless is starts with DATE_ (check with a regular expression) Parse date and time out of filename (using a regex) If parsed date is less than the two week's ago date that you calculated at the start then delete the file Read next file from directory handle -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re: How do I find and delete files based on age? by philcrow (Priest) on Feb 26, 2007 at 14:43 UTC
Perl provides a lot of its functionality through external modules like File::Find. Some of them ship with Perl, others are available for download from CPAN, like DateTime. Those two modules will probably solve your problems. Some people prefer File::Find::Rule over File::Find. Phil	[reply]
Re^2: How do I find and delete files based on age? by Anonymous Monk on Mar 09, 2011 at 22:47 UTC
that totally works	[reply]
Re: How do I find and delete files based on age? by andye (Curate) on Feb 26, 2007 at 15:04 UTC
Hi macvsog, You won't have a problem once you get into it, this is bread-and-butter stuff for Perl. Things you'll want to take a look at: opendir readdir stat time unlink system and as others have pointed out, there's a bunch of modules to help you with this kind of thing too... best of luck with it. HTH! update: oh, and maybe substr as well, if you don't want to get into regular expressions quite yet...	[reply]
Re: How do I find and delete files based on age? by Fletch (Bishop) on Feb 26, 2007 at 14:44 UTC
Familiarity with POSIX and/or C can help in knowing what to look for. You're interested in stat for retrieving file information, or possibly the `-M` operator (documented in perlfunc as well, type `perldoc -f -X` at your shell prompt). And File::Find or File::Find::Rule will help you traverse your filesystem to get the victim directories' names. File::Path has routines for blowing away directory trees, but it may be just as simple to shell out to `rm -rf ./blah` via system.	[reply] [d/l] [select]
Re: How do I find and delete files based on age? by scorpio17 (Canon) on Feb 26, 2007 at 15:11 UTC
As you know, "there's more than one way to do it...", but this may help get you started: `#!/usr/bin/perl use strict; use File::Find; if ($ARGV[0] eq "") { $ARGV[0]="."; } my @file_list; find ( sub { my $file = $File::Find::name; if ( -f $file && $file =~ /^DATE_/) { push (@file_list, $file) } }, @ARGV); my $now = time(); # get current time my $AGE = 606024*14; # convert 14 days into seconds for my $file (@file_list) { my @stats = stat($file); if ($now-$stats[9] > $AGE) { # file older than 14 days print "$file\n"; } }` [download] Assuming you name this script cleanup.pl, you would use it like this: `cleanup.pl /var/backups/repository` [download] If you don't specify a directory, it will use whatever the current directory is. Note than the stats function returns an array of info, which I'm saving into the @stats array. Element 9 contains the last modification time, which may be different than the actual creation time (read up on stats so you know which one you want to use). Also, this example just prints out the files starting with DATE_ that are 14 days old (or older). Change the print statement to: `unlink $file;` [download] to actually delete them. This may leave you with empty directories, but you can write another script to delete empty directories after running this one.	[reply] [d/l] [select]
Re^2: How do I find and delete files based on age? by blazar (Canon) on Feb 28, 2007 at 11:40 UTC
`use strict;` [download] Why not `use warnings; # as well?` [download] `if ($ARGV[0] eq "") { $ARGV[0]="."; }` [download] Later on you say: "If you don't specify a directory, it will use whatever the current directory is." Had you warnings turned on, this would trigger an `'uninitialized'` warning. Which is sensible: actually `$ARGV[0]` would be undefined rather than strictly equal to the empty string. I would use the simpler `@ARGV = '.' unless @ARGV;` [download] so that all the directories supplied on the command line would be searched, and a reasonable default would be provided if none is specified. Granted: this is not meant as a harsh critique to your code. I know it is just an example. I only want to expand a little bit on the subject. `my @file_list; find ( sub { my $file = $File::Find::name; if ( -f $file && $file =~ /^DATE_/) { push (@file_list, $file) } }, @ARGV);` [download] Two things: I like to use File::Find's `no_chdir` mode, so that I wouldn't need `$File::Find::name`. As of now your code is actually wrong, since `find()` is changing dir, so that `$file` which is a path relative to the base dir being searched, will be interpreted relative to the cwd, and -f will most likely fail, except for coincidences; I used to write such code too, that first collects filenames, and then process them later. If huge volumes of files are to be skimmed through, though, this may make the script seemingly "hang" before it says something interesting. Thus nowadays I avoid doing so, if possible. In this particular case I see no reason why the check on the date couldn't be made in the sub that is supplied to `find()` in the first place. (Ok, the resulting code wouldn't do exactly the same as yours, the difference being given a few seconds or at most minutes whereas the threshold is measured in days - so I wouldn't regard it as significative.) `my @stats = stat($file); if ($now-$stats[9] > $AGE) { # file older than 14 days` [download] I know you probably know, and include an intermediate passage for clarity and instructive purposes, but it is perhaps worth reminding that one can take a list slice as well, and that the temporary `@stats` variable is not needed: `if ($now-(stat $file)[9] > $AGE) { # file older than 14 days` [download] BTW: I am the first one to say one shouldn't care about premature optimization, but stats are known to be expensive, and `$file` is already statted when it's being searched, so one more reason to do the check at `find()` time.	[reply] [d/l] [select]
Re: How do I find and delete files based on age? by duckyd (Hermit) on Feb 26, 2007 at 20:44 UTC
If your task really is as simple as you describe (and you don't anticipate it becoming more complicated later on) then there's no reason not to just use find: `find ./ -name 'DATE_*' -mtime +14 -exec rm -rf {} \;` backup first, test before you run (w/o the -exec rm -rf {} \;) to verify it finds the right fields, etc, etc...	[reply] [d/l]
Re^2: How do I find and delete files based on age? by bsdz (Friar) on Feb 27, 2007 at 14:17 UTC
I have never used it but I believe find2perl will convert the above command into pure perl. Just another option :)	[reply]
Re^2: How do I find and delete files based on age? by 0xbeef (Hermit) on Feb 28, 2007 at 08:41 UTC
The use of a relative path is a good thing, but this is incomplete. Paranoia should take control of when you use a destructive command and you should never make assumptions. Here is a simplified example where the tests are inadequate: `cd /targetdir/targetsubdir rm -fr ` [download] Imagine if the target directory was not mounted, or your chdir failed for whatever reason (e.g. inadequate permissions). Yes, you are likely now listening to the whirr of you hard-disk working feverishly to delete everything from the directory you were in prior to the failed cd, and I have personally witnessed cases of that particular directory being / . The safe approach is: `> cd $TARGETDIR && rm -fr ./targetsubdir or > test -d $TARGETDIR && find . -name 'DATE_' -type f -mtime +14 -exec + rm -fr {} \;` [download] Niel	[reply] [d/l] [select]
Re: How do I find and delete files based on age? by talexb (Chancellor) on Feb 27, 2007 at 04:22 UTC
You're developing for Linux/Unix, right? I'm surprised that no one's mentioned `tmpwatch` yet. It's a tool that specifically written to get rid of files older than a particular age. True, there's no Perl involved -- but sometimes the best answer is to not use Perl at all. Update: .. And here's a link to the first page that Google found for `tmpwatch`. You can also find information about it by typing `man tmpwatch` on your Unix/Linux system. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]
Re^2: How do I find and delete files based on age? by Anonymous Monk on Feb 28, 2007 at 00:22 UTC
WARNING: The following text is much like the ramblings of an old man, etc ... Ahhh ... the ol' "remove old files" script. Had to do a few of these, over the years, at numerous companies. It is usually as a result of the file system filling up with "old" log files, etc (eg log001, log002, etc). The easiest thing to do back then was to run the one-liner Unix "find" command with the "-mtime" followed by the "-exec" or "-delete" flag (see http://unixhelp.ed.ac.uk/CGI/man-cgi?find). Just an option to consider ... ... Anyway the important thing here, however, was that in the early days I made a complete mess of things when I didn't TEST the script first. Nowadays I have a scheduled task that archives/zips the old (1 week) files to another directory (much like a trashcan). Another task removes or deletes these files from the archived directory sometime later (if they are 2 or more weeks old). Of course, you can manually delete the files at any time knowing that they have been backed-up on tape by the system administrators - right? Some tips (hopefully it's not too late ...) In your script (perl or otherwise): * Have an option to only display the files to be deleted - or to display before deletion (with a confirmation) * Have an option to archive rather than delete old files (eg move to another directory and gzip) * Have an option to "restore" files from the archive (ie an "undelete") * As you become more confident, allow handling of files via "regular expression" - handly for file names containing unsual characters or spaces, etc. * Perhaps you can consider searching for files based on file attributes such as file sizes and (modified) dates (use ranges) and file types * Log what has been archived (or restored) or deleted, the time and the user id * If this is your first script - ever - and you are basically performing a "rm . www.*" then for goodness sake do not put your name on the script! - Laz.	[reply]
Re: How do I find and delete files based on age? by Moron (Curate) on Feb 26, 2007 at 15:14 UTC
untested Perl one liner: `perl -e 'system "rm -rf $_" if ( -M $_ > 14 ) for ( glob "/var/backups +/repository/DATE_" );'` [download] Note: this is a deliberately simplistic example. To optimise it to avoid shelling out (using either File::Find or putting the glob in a recursive subroutine) would require more Perl experience than OP suggests. Update: Newbies will particularly benefit from Learning Perl, Third Edition - Making Easy Things Easy and Hard Things Possible By Randal L. Schwartz & Tom Phoenix. -M Free your mind*	[reply] [d/l]
Re^2: How do I find and delete files based on age? by graq (Curate) on Feb 26, 2007 at 16:28 UTC
I would also highlight strongly to any 'Newbies', especially those unfamiliar with nix environments not to run this one liner before making backups*. Because it deletes things. And things that delete other things should always be tested first -=( Graq )=-	[reply]
Re^3: How do I find and delete files based on age? by Moron (Curate) on Feb 26, 2007 at 16:36 UTC
Surely the backup is more likely to go wrong than the one liner! Instead a newbie should only be working unsupervised on a non-production machine and backups should be done daily (and in this case also on demand) by properly-qualified staff for all machines including non-production. Moreover, Anybody, not just newbies, can make a mistake and it is rather comforting to have the capability to call your friendly sysadmin to restore the damaged goods back to a previous state if there's no quicker repair available. I remember having a misunderstanding in a Q/A system over the first digit in an identifier once and deleting everything BUT the data I was supposed to be deleting. Fortunately, a phone call and ten minutes later it was back to where it was. -M Free your mind	[reply]
Re^2: How do I find and delete files based on age? by exussum0 (Vicar) on Feb 26, 2007 at 17:32 UTC
Touching on what was replied to you, I'd suggest replacing "rm -rf $_" with "echo rm -rf $_" At least then you can audit it until it works perfectly. ^^	[reply]
Re^3: How do I find and delete files based on age? by TGI (Parson) on Feb 26, 2007 at 20:51 UTC
I don't count myself as a newbie, but I always do a test like you suggest (print plain output) before unleashing a deletion script on my system--even if it's just a script to clean out a temp directory. It only takes a moment to make sure that you will be deleting what you think you will be deleting, and it is easily worth the time. How long will it take you to restore your files from backup? You have do backups, right? TGI says moo	[reply]
Re^3: How do I find and delete files based on age? by Moron (Curate) on Feb 26, 2007 at 17:47 UTC
Yes, that I do agree with. And "print" if its not a shell-out. I also quite often put an echo in front of the perl -e (update: for long or multiple lines being typed in) to check that I typed what I think I typed before actually running it. -M Free your mind	[reply]
Re: How do I find and delete files based on age? by Anonymous Monk on Feb 26, 2007 at 19:32 UTC
Look into the File::Glob module to get your file list (perldoc File::Glob). You can use regular expressions to look for filename matches (perldoc perlre), it'll look something like: if ($filename ~= m/DATE_/) For the date comparison, are you going to use a system date, or is there some sort of naming convention that integrates the date into the file name? If the latter, regular expressions are your friend again.	[reply]


No such thing as a small change
	PerlMonks