In the general case, you have no choice but to read (and stat) all the files in the directory, remembering the oldest one you have seen, in order to find the oldest file.
Without relying on a lot of implicit assumptions on how directory slots are allocated and freed in the face of creating and deleting files, you are not going to be able to produce anything robust.
How often do files get created? How often do you need to fetch the oldest file? Maybe you could get away with scanning the whole mess once every minute, and create a symbolic link from the oldest file to a file of a fixed name. That way you just have to open 'oldest.file'. If you are on a braindead operating system that does not implement symbolic links, you can emulate them by opening 'oldest.file' and writing the name of the oldest file that the scan turned up.
Another idea would be to cache the work. Once you have read the 10 000 files, write out the epoch time and the file name into a file name 'age.cache'. Each time new files come along, see if the oldest files are still around, and drop them from the file if they are not, and add the newest files onto the end. That way when you need to find the oldest file, it's the first record in the file. On second thoughts, this would be a nightmare to get to run reliably.
<update> thinking some more about this question last night led
to the following point: Unix and NT systems will update the last modified
date of a directory each time a file is created or deleted. This allows
you to have a dirty-bit flag, to at least know if anything has changed
since last time you looked at the files. But note that under NT (and I'm talking
NT 4 here), this behaviour is configurable in the kernel. You can choose to turn
this off if you want. I have a NT server at work that runs under a crushing load,
and this is one of the speed optimisations I made. But you probably know if you
did such a thing, and it's easy to test whether it is the case.
Also know that you'll have less of a performance hit (read: memory spike)
if instead of doing my @files = readdir(DIR) you do something like:
my $oldest = time;
my $oldest_file = undef;
while( defined( my $file = readdir(DIR) )) {
# or use File::Spec for extra portability below
next if $file eq '.' or $file eq '..';
$age = (stat $file)[9];
if( $age < $oldest ) {
$age = $oldest;
$oldest_file = $file;
}
}
That is, loop through entry by entry rather than sucking the 10 000 entries
in one hit, to reduce your memory footprint.</update>
Above all, note that having 1e5 files in a single directory is a pretty bad idea, and one that should be avoided at all costs. You should try saving files out into separate directory, based on the age of the file. If you divided the epoch time by 21600, you would be adding four new directories per day (once every six hours). Right now, the directory name would be 46447. You would then only have to go the lowest numbered directory and search within, thereby drastically reducing the number of files you would have to stat.
Otherwise, get a database.
--g r i n d e r
|