bingeldac has asked for the wisdom of the Perl Monks concerning the following question:

With all of these new mega websites becoming more complex I have stumbled on a problem that I do not know how to solve best. Suppose we have a very large directory structure of 5000 or so directories. If one needs to delete every file over two weeks old what is the best way to do it? In regards to that does one use Epoch seconds (is there a better way)? Also since there will be millions of files is it best to build an array and then delete them or delete them as the program goes. Anyone have any suggestions on how to do this (perhaps some code guiding the way)? Show me the light.

Replies are listed 'Best First'.
Re: Deleting files over 2 weeks old?
by merlyn (Sage) on Aug 03, 2000 at 02:28 UTC
    A minor correction: some of the posters above used -C or "ctime" and mentioned them in the same breath as "create time". There's no create time in Unix. The "ctime" value is the "last inode change" value: the last time anything "changed" about the file, such as contents, permissions, ownership, or number of links (including renaming).

    "Files over two weeks old" is often an ambiguous specification. For deleting items from a cache, I suggest the "atime" value (most recent access), as it is often indicative of an orphan when there are no more accesses in a long time.

    As for using Perl to zap files with big atimes, I'd go for a command-line-written chunk of code:

    $ find2perl /some/dir -atime +14 -eval unlink >MyProggy $ chmod +x MyProggy $ ./MyProggy # as you wish

    -- Randal L. Schwartz, Perl hacker

      An excellent point on the ctime, merlyn. However, why redirect and execute, as opposed to piping it into 'xargs', and using 'rm'?

      --Chris

      e-mail jcwren
        OK, let's take the classic root crontab code:
        find /tmp /usr/tmp -atime +7 -print | xargs /bin/rm -f
        And then I come along (as an ordinary user) and do the following:
        $ mkdir -p "/tmp/foo /etc" # yes, that's a newline after foo before the /etc $ touch "/tmp/foo /etc/passwd" # yes, that's a newline after foo again
        And then sit back 7 days. Boom. You have no /etc/passwd.

        The problem is that you are using newline as a delimiter, and yet it is a legal filename character. You need find .. -print0 and xargs with a -0, but that's not portable. Even though Perl isn't strictly everywhere, it's everywhere the perlmonks are, so my solution succeeds in a safe way.

        -- Randal L. Schwartz, Perl hacker

Re: Deleting files over 2 weeks old?
by Shendal (Hermit) on Aug 03, 2000 at 00:42 UTC
    Use UNIX utils if you want to, but I prefer perl. Here's a code snippet:
    #!/usr/bin/perl -w use strict; use File::Find; my($dir) = shift; my($days) = 14; die "Invalid dir" unless (-d $dir); &find(\&wanted, "$dir"); sub wanted { -f && (int(-C _) < $days) && print "$File::Find::name\n"; }
    Of course, this will just print which files it would delete, not actually delete them. That's left to the reader. :)
(jcwren) RE: Deleting files over 2 weeks old?
by jcwren (Prior) on Aug 03, 2000 at 00:28 UTC
    I don't know if you're determined to use Perl or not to do this, but the Unix command 'find' will allow you to specify a number of days since create date or last access date. Pipe the output to 'xargs', and use 'rm'. Something like this:

    find -ctime +14 | xargs -f rm

    This should find all files created more than 14 days ago, and delete them. You could put this in a daily cron job, and it'll all happen for you.

    Update: Don't do this! As merlyn points out in a following reply, this is a security risk. I'm a casual Unix user, and while I knew that filenames could contain odd characters, I didn't realize the implication of newlines in a file name. See below for more details.

    --Chris

    e-mail jcwren
      As an amusing side note: a friend of mine worked for a large local ISP that shall remain nameless. They used the following as production code:
      cd /some/path/name find . -mtime +30 -print | xargs rm
      The problem, of course, is that this was run as root from the root directory. My friend got paged out of a movie when /some/path/name turned out to be invalid. You can imagine what happened :)

      The other problem (and I haven't tested this, to be honest) is that the above code will not get rid of filenames with spaces. Of course, that's pretty obscure, so I doubt it would come up.

Re: Deleting files over 2 weeks old?
by ferrency (Deacon) on Aug 03, 2000 at 00:34 UTC
    If this were a UNIX system, I might skip Perl altogether, and use the builtin find command instead.

    # cd root_directory_I_need_to_cull # find . -mtime +14 -exec rm {};
    (Note: # is root prompt, not comment mark :)

    You might need to escape the ; in your shell. You can specify -atime, -ctime, or -mtime to check the access time, inode change time (approximately file creation time), or modification time for the file.

    But since this is a Perl site, and not a Unix site, I'd recommend using the File::Find module, and perl's built in file testers -M, -C, and -A to filter out based on file times.

    Or, as perldoc File::Find says, you can use find2perl to change a "find" command into a perl code snippet using File::Find which does the same thing.

    Options, options, options...

    Alan