Quickest way to get the oldest file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Quickest way to get the oldest file by graff (Chancellor) on Jan 19, 2004 at 21:38 UTC
Well, given the requirement for rapid turn-around from the web server, the best solution might be to maintain a separate database of file names and creation dates. The cross-platform requirement means you need to decide whether you want to require a cross-platform RDBM (like mysql) or just accept a certain amount of performance degradation (extra load on the server) to maintain and sort a flat-file table. If the database approach doesn't appeal to you (don't worry -- most people would prefer not to have to do that), the response time will depend on how many files are present in any single given directory. The following sort of code ought to do what you want on either type of server: `sub get_oldest_file { my $path = shift; # name of directory to search opendir( D, $path ) or return "Unable to read directory $path\n"; my $oldest_age = 0; my $oldest_name = ''; for ( readdir( D )) { next unless ( -f "$path/$_"); my $age = ( -M _ ); # note the "_" : uses stat data loaded by +"-f" above if ( $age > $oldest_age ) { $oldest_name = $_; $oldest_age = $age; } } closedir D; return $oldest_name; }` [download] update: naturally, you'll want to use the "File::Spec" modules when you go to cross-platform usage.	[reply] [d/l]
Re: Quickest way to get the oldest file by hardburn (Abbot) on Jan 19, 2004 at 21:06 UTC
`my ($oldest) = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, -M $_ ] } <>;` [download] The above isn't very memory-efficient, but is probably as quick as you can hope for. Change `-M` with `-A` or `-C` if necessary (see relevent Perl doc). Might also want to check that the file is not a directory. The `<>` might have to be changed to get a directory other than the current one. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re:x2 Quickest way to get the oldest file (don't sort) by grinder (Bishop) on Jan 20, 2004 at 08:24 UTC
Not only is this not memory-efficient, it's not speed-efficient either. There's no need to sort the entire list just to find the largest value. A reasonably efficient algorithm will require `n*log(n)` comparisons. With 50000 elements, the `log(n)` starts to become non-negligeable. Other suggestions here that involve walking down the list and noting the largest value are much more efficient both in terms of speed and memory. They require `n` comparisons and negligeable additional memory. It's a win-win situation.	[reply]
Re: Quickest way to get the oldest file by mr_mischief (Monsignor) on Jan 19, 2004 at 21:20 UTC
You'd want to check the modification time or date for all the files. Then you'd want to do a typical minimum function on the epoch times at which they were modified (using data from stat()) or a maximum function on the number of days since the file was last modified (using data from -M for example). Since the -X file test ops handle partial days, I usually just use those instead of stat if I only need one value. A quick and dirty test for the oldest file in a directory follows. It currently makes no check to see if it's pointing to a regular file or something else (socket, directory, etc.). It just prints the name of the oldest file and the number of days since it was modified. I tested it a little and it seems to work. `use strict; use warnings; my $max = 0; my ( $time, $oldest ); opendir ( my $d, '.' ) \|\| die "Can't get file list: $!\n"; my @file = readdir ( $d ); closedir ( $d ); foreach ( @file ) { next if /^\.{1,2}$/; $time = -M $_; if ( $time > $max ) { $max = $time; $oldest = $_; } } print "$oldest was modified $max days ago\n";` [download] Christopher E. Stith	[reply] [d/l]
Re: Quickest way to get the oldest file by derby (Abbot) on Jan 19, 2004 at 21:04 UTC
I'm not sure you can get one solution for this due to differing filesystems on differing platforms. Even on Nix platforms you cannot not reliably do this (depending on your definition of oldest* and file). I've always found it easier to control file naming but ... this snippet might help (at least on Nix). `#!/usr/bin/perl use IO::Dir; tie %dir, 'IO::Dir', "."; # assume ctime is close enough for creation time foreach (sort { $dir{$b}->ctime <=> $dir{$a}->ctime } keys %dir ) { print $_, " " , $dir{$_}->ctime,"\n"; }` [download] -derby update:* Of course jweed got what I was talking about (admittedly in a round-a-bout way). The OP wanted the oldest created file. In Nix world, you cannot do that. There are three times associated with a file - last access, last modified and last change time. The -M solutions that follow will work as long as your concept of modification and creation are the same. I can create a file on Jan 1st, another on June 1. If I modify the Jan 1st file (and possibly in some non-significant manner) on Aug 1, then by the -M method, the Jan 1 file will be newer* than the June 1 file. See - it all depends on how you define oldest. If your files are created but never modified (such as a caching scheme) then the -M works fine; however, if there is a chance that the files will be modified and you don't count the modifications as making the file newer then checking the ctime is probably better. But then again, that can have issues to ... so that's why I normally suggest a naming convention (date/time) to remove ambiguity.	[reply] [d/l]
Re: Re: Quickest way to get the oldest file by jweed (Chaplain) on Jan 19, 2004 at 21:51 UTC
Even on Nix platforms you cannot not reliably do this* To clarify: (from the unix-faq): 3.1) How do I find the creation time of a file? You can't - it isn't stored anywhere. Files have a last-modified time (shown by "ls -l"), a last-accessed time (shown by "ls -lu") and an inode change time (shown by "ls -lc"). The latter is often referred to as the "creation time" - even in some man pages - but that's wrong; it's also set by such operations as mv, ln, chmod, chown and chgrp. The man page for "stat(2)" discusses this. Code is (almost) always untested. http://www.justicepoetic.net/	[reply]
Re: Quickest way to get the oldest file by Roy Johnson (Monsignor) on Jan 19, 2004 at 21:06 UTC
Could you maintain a database? That would probably be the fastest. You'd update it every time a file was deleted or a directory was created. The PerlMonk `tr///` Advocate	[reply]
Re: Quickest way to get the oldest file by BrowserUk (Patriarch) on Jan 20, 2004 at 09:06 UTC
It will take a little memory, but it cracks on a bit :) `#! perl -slw use strict; use List::Util qw[ reduce ]; die 'Usage: $o <dir>' unless @ARGV >0; print reduce{ -C $a > -C $b ? $a : $b } glob for @ARGV;` [download] Or as a one-liner `perl -MList::Util=reduce -le"print reduce{-C $a>-C $b?$a:$b}glob for @ +ARGV;" cache////` [download] Caveat: Will need adjustments for *nix. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Timing (and a little luck) are everything!	[reply] [d/l] [select]
Re: Quickest way to get the oldest file by duff (Parson) on Jan 20, 2004 at 03:04 UTC
Others have suggested you use a database and I tend to agree with them. It's unclear whether you meant 50,000 files or 50,000 directories each with some number of files but either way, searching through that many entries in a filesystem is slow (and hopefully it's not 50,000 files all in one directory!) Another mechanism that you might be able to use is encoding the time information in the file/directory names themselves. Something like: `2004/01/19/00/filename.01 2004/01/19/00/filename.02 2004/01/19/00/filename.03 ... 2004/01/19/01/filename.01 ... 2004/01/19/02/filename.01 ...` [download] In this hypothetical example the files are arranged by `year/month/day/hour/`filename`.minute`. You can see that it would be relatively easy to find the oldest file if you could arrange for such a structure. I don't exactly know if this technique would be useful to your problem, but there it is. Or you could just use a database like postgres, mysql, berkeley DB, etc. (I believe all of these are available on both linux and windows) with an index created on the time of each entry. :-) duff	[reply] [d/l] [select]
Re: Re: Quickest way to get the oldest file by Anonymous Monk on Jan 20, 2004 at 20:24 UTC
Thanks all for your replies and helpful suggestions, There will be multiple directories (but I will only be concerned with one at a time), the one that I will be using will be specified, and this may have up to 50,000 files in the one directory. This is the most that we have ever seen in one directory, but it is normally more like 1000. I do want to plan for the worst case scenario though. Unfortunately a file and directory naming sturcture like what you are suggesting is not an option, as this will be specified by the client. I ran gaff's code from above and found it to be a reasonable speed. Perhaps a little slow when tested against 50,000 files but I only ran it on this slow machine. The only other idea that I had was to keep a record of the oldest file in a text file, and then this will only need to be updated when a file in that directory is deleted, or more to the point when that particular file is deleted. So I could return a response to the client, and then continue on with finding the oldest file, in the background... Any thoughts?	[reply]
(z) Re: Quickest way to get the oldest file by zigdon (Deacon) on Jan 21, 2004 at 20:43 UTC
Sometimes, the first file in the directory is the oldest one: `my $file; opendir(DIR, $dir) or die "Can't read $dir: $!"; $file = readdir(DIR) until $file =~ /^[^.]/; print "First file is $file\n"; closedir DIR;` [download] This should be pretty fast and memory efficient, since it doesn't actually read the whole directory. Of course, it won't work if the files in the directory can be modified after being placed there, as the oldest file will no longer be the first. -- zigdon	[reply] [d/l]
Re: (z) Re: Quickest way to get the oldest file by mr_mischief (Monsignor) on Jan 23, 2004 at 15:00 UTC
Sometimes is right. The code should work on some filesystems, but it's dangerous to make any assumptions of this kind. One would need to know the documented behavior of a filesystem for this to be trusted, especially so across different implementations or versions of a filesystem as any undocumented behavior is apt to change. Still, in the right circumstances, it's probably going to be the fastest option. Christopher E. Stith	[reply]