Re: Quickest way to get the oldest file
by graff (Chancellor) on Jan 19, 2004 at 21:38 UTC
|
Well, given the requirement for rapid turn-around from the web server, the best solution might be to maintain a separate database of file names and creation dates. The cross-platform requirement means you need to decide whether you want to require a cross-platform RDBM (like mysql) or just accept a certain amount of performance degradation (extra load on the server) to maintain and sort a flat-file table.
If the database approach doesn't appeal to you (don't worry -- most people would prefer not to have to do that), the response time will depend on how many files are present in any single given directory. The following sort of code ought to do what you want on either type of server:
sub get_oldest_file
{
my $path = shift; # name of directory to search
opendir( D, $path ) or return "Unable to read directory $path\n";
my $oldest_age = 0;
my $oldest_name = '';
for ( readdir( D ))
{
next unless ( -f "$path/$_");
my $age = ( -M _ ); # note the "_" : uses stat data loaded by
+"-f" above
if ( $age > $oldest_age ) {
$oldest_name = $_;
$oldest_age = $age;
}
}
closedir D;
return $oldest_name;
}
update: naturally, you'll want to use the "File::Spec" modules when you go to cross-platform usage. | [reply] [d/l] |
Re: Quickest way to get the oldest file
by hardburn (Abbot) on Jan 19, 2004 at 21:06 UTC
|
my ($oldest) =
map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, -M $_ ] }
<*>;
The above isn't very memory-efficient, but is probably as quick as you can hope for. Change -M with -A or -C if necessary (see relevent Perl doc). Might also want to check that the file is not a directory. The <*> might have to be changed to get a directory other than the current one.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [d/l] [select] |
|
|
Not only is this not memory-efficient, it's not speed-efficient either. There's no need to sort the entire list just to find the largest value. A reasonably efficient algorithm will require n*log(n) comparisons. With 50000 elements, the log(n) starts to become non-negligeable.
Other suggestions here that involve walking down the list and noting the largest value are much more efficient both in terms of speed and memory. They require n comparisons and negligeable additional memory. It's a win-win situation.
| [reply] |
Re: Quickest way to get the oldest file
by mr_mischief (Monsignor) on Jan 19, 2004 at 21:20 UTC
|
You'd want to check the modification time or date for all the files. Then you'd want to do a typical minimum function on the epoch times at which they were modified (using data from stat()) or a maximum function on the number of days since the file was last modified (using data from -M for example). Since the -X file test ops handle partial days, I usually just use those instead of stat if I only need one value.
A quick and dirty test for the oldest file in a directory follows. It currently makes no check to see if it's pointing to a regular file or something else (socket, directory, etc.). It just prints the name of the oldest file and the number of days since it was modified. I tested it a little and it seems to work.
use strict;
use warnings;
my $max = 0;
my ( $time, $oldest );
opendir ( my $d, '.' ) || die "Can't get file list: $!\n";
my @file = readdir ( $d );
closedir ( $d );
foreach ( @file )
{
next if /^\.{1,2}$/;
$time = -M $_;
if ( $time > $max )
{
$max = $time;
$oldest = $_;
}
}
print "$oldest was modified $max days ago\n";
| [reply] [d/l] |
Re: Quickest way to get the oldest file
by derby (Abbot) on Jan 19, 2004 at 21:04 UTC
|
I'm not sure you can get one solution for this due to
differing filesystems on differing platforms. Even on *Nix platforms
you cannot not reliably do this (depending on your definition
of oldest and file). I've always found it easier
to control file naming but ... this snippet might help (at least on
*Nix).
#!/usr/bin/perl
use IO::Dir;
tie %dir, 'IO::Dir', ".";
# assume ctime is close enough for creation time
foreach (sort { $dir{$b}->ctime <=> $dir{$a}->ctime } keys %dir ) {
print $_, " " , $dir{$_}->ctime,"\n";
}
-derby
update: Of course jweed got what I was talking about
(admittedly in a round-a-bout way). The OP wanted the oldest created file. In *Nix world, you cannot do that. There are three times associated with a file - last access, last modified and last change time. The -M solutions that follow will work
as long as your concept of modification and creation are the same. I can
create a file on Jan 1st, another on June 1. If I modify the Jan 1st file (and possibly in some non-significant manner) on Aug 1, then by the -M method, the Jan 1 file will be newer than the June 1 file. See - it all depends on how you define oldest. If your files are created but never modified (such as a caching scheme) then the -M works fine; however, if there is a chance that the files will be modified and you don't count the
modifications as making the file newer then checking the ctime
is probably better. But then again, that can have issues to ... so that's why
I normally suggest a naming convention (date/time) to remove ambiguity.
| [reply] [d/l] |
|
|
Even on *Nix platforms you cannot not reliably do this
To clarify: (from the unix-faq):
3.1) How do I find the creation time of a file?
You can't - it isn't stored anywhere. Files have a last-modified time (shown by "ls -l"), a last-accessed time (shown by "ls -lu") and an inode change time (shown by "ls -lc"). The latter is often referred to as the "creation time" - even in some man pages - but that's wrong; it's also set by such operations as mv, ln, chmod, chown and chgrp.
The man page for "stat(2)" discusses this.
| [reply] |
Re: Quickest way to get the oldest file
by Roy Johnson (Monsignor) on Jan 19, 2004 at 21:06 UTC
|
| [reply] |
Re: Quickest way to get the oldest file
by BrowserUk (Patriarch) on Jan 20, 2004 at 09:06 UTC
|
It will take a little memory, but it cracks on a bit :)
#! perl -slw
use strict;
use List::Util qw[ reduce ];
die 'Usage: $o <dir>' unless @ARGV >0;
print reduce{ -C $a > -C $b ? $a : $b } glob for @ARGV;
Or as a one-liner
perl -MList::Util=reduce -le"print reduce{-C $a>-C $b?$a:$b}glob for @
+ARGV;"
cache/*/*/*/*
Caveat: Will need adjustments for *nix.
| [reply] [d/l] [select] |
Re: Quickest way to get the oldest file
by duff (Parson) on Jan 20, 2004 at 03:04 UTC
|
Others have suggested you use a database and I tend to agree with them. It's unclear whether you meant 50,000 files or 50,000 directories each with some number of files but either way, searching through that many entries in a filesystem is slow (and hopefully it's not 50,000 files all in one directory!) Another mechanism that you might be able to use is encoding the time information in the file/directory names themselves. Something like:
2004/01/19/00/filename.01
2004/01/19/00/filename.02
2004/01/19/00/filename.03
...
2004/01/19/01/filename.01
...
2004/01/19/02/filename.01
...
In this hypothetical example the files are arranged by
year/month/day/hour/filename.minute. You can see that it would be relatively easy to find the oldest file if you could arrange for such a structure. I don't exactly know if this technique would be useful to your problem, but there it is. Or you could just use a database like postgres, mysql, berkeley DB, etc. (I believe all of these are available on both linux and windows) with an index created on the time of each entry. :-)
| [reply] [d/l] [select] |
|
|
Thanks all for your replies and helpful suggestions,
There will be multiple directories (but I will only be concerned with one at a time), the one that I will be using will be specified, and this may have up to 50,000 files in the one directory. This is the most that we have ever seen in one directory, but it is normally more like 1000. I do want to plan for the worst case scenario though. Unfortunately a file and directory naming sturcture like what you are suggesting is not an option, as this will be specified by the client.
I ran gaff's code from above and found it to be a reasonable speed. Perhaps a little slow when tested against 50,000 files but I only ran it on this slow machine.
The only other idea that I had was to keep a record of the oldest file in a text file, and then this will only need to be updated when a file in that directory is deleted, or more to the point when that particular file is deleted. So I could return a response to the client, and then continue on with finding the oldest file, in the background... Any thoughts?
| [reply] |
(z) Re: Quickest way to get the oldest file
by zigdon (Deacon) on Jan 21, 2004 at 20:43 UTC
|
Sometimes, the first file in the directory is the oldest one: my $file;
opendir(DIR, $dir) or die "Can't read $dir: $!";
$file = readdir(DIR) until $file =~ /^[^.]/;
print "First file is $file\n";
closedir DIR;
This should be pretty fast and memory efficient, since it doesn't actually read the whole directory. Of course, it won't work if the files in the directory can be modified after being placed there, as the oldest file will no longer be the first.
| [reply] [d/l] |
|
|
| [reply] |