In the general case, you have no choice but to read (and stat) all the files in the directory, remembering the oldest one you have seen, in order to find the oldest file.

Without relying on a lot of implicit assumptions on how directory slots are allocated and freed in the face of creating and deleting files, you are not going to be able to produce anything robust.

How often do files get created? How often do you need to fetch the oldest file? Maybe you could get away with scanning the whole mess once every minute, and create a symbolic link from the oldest file to a file of a fixed name. That way you just have to open 'oldest.file'. If you are on a braindead operating system that does not implement symbolic links, you can emulate them by opening 'oldest.file' and writing the name of the oldest file that the scan turned up.

Another idea would be to cache the work. Once you have read the 10 000 files, write out the epoch time and the file name into a file name 'age.cache'. Each time new files come along, see if the oldest files are still around, and drop them from the file if they are not, and add the newest files onto the end. That way when you need to find the oldest file, it's the first record in the file. On second thoughts, this would be a nightmare to get to run reliably.

<update> thinking some more about this question last night led to the following point: Unix and NT systems will update the last modified date of a directory each time a file is created or deleted. This allows you to have a dirty-bit flag, to at least know if anything has changed since last time you looked at the files. But note that under NT (and I'm talking NT 4 here), this behaviour is configurable in the kernel. You can choose to turn this off if you want. I have a NT server at work that runs under a crushing load, and this is one of the speed optimisations I made. But you probably know if you did such a thing, and it's easy to test whether it is the case.

Also know that you'll have less of a performance hit (read: memory spike) if instead of doing my @files = readdir(DIR) you do something like:

my $oldest = time; my $oldest_file = undef; while( defined( my $file = readdir(DIR) )) { # or use File::Spec for extra portability below next if $file eq '.' or $file eq '..'; $age = (stat $file)[9]; if( $age < $oldest ) { $age = $oldest; $oldest_file = $file; } }

That is, loop through entry by entry rather than sucking the 10 000 entries in one hit, to reduce your memory footprint.</update>

Above all, note that having 1e5 files in a single directory is a pretty bad idea, and one that should be avoided at all costs. You should try saving files out into separate directory, based on the age of the file. If you divided the epoch time by 21600, you would be adding four new directories per day (once every six hours). Right now, the directory name would be 46447. You would then only have to go the lowest numbered directory and search within, thereby drastically reducing the number of files you would have to stat.

Otherwise, get a database.

--
g r i n d e r

In reply to Re: How to get the oldest file in a directory without reading all files? by grinder
in thread How to get the oldest file in a directory without reading all files? by Marcello

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.