in reply to Print the oldest file in a directory.

I can't believe asking the OS for the mod-time of every file some 1000 times is going to be faster than caching that in the program space of your program.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: Print the oldest file in a directory.

Replies are listed 'Best First'.
Re: Print the oldest file in a directory.
by gnu@perl (Pilgrim) on Oct 30, 2002 at 21:08 UTC
    Yeah, it dosen't make sense to me either. I had already written the little one liner when someone here mentioned using an ST instead. For the exact same reason you stated above. I am checking the mod time of the same file at a minimum of 2 times, maybe more if the next file makes it out of the <=> operation.

    Please keep in mind, this is the first time I have ever used an ST. It is pretty much copied directly from http://www.5sigma.com/perl/schwtr.html.

    Here is the benchmark code I used to test it with the results. The directory had 16,108 files in it. Let me know what you think.

    #!/usr/bin/perl -w use strict; use Benchmark; timethese(100, { 'chad' => \&chad, 'swartz' => \&st,}); sub chad { sort{ (-M $b) <=> (-M $a) } glob("*"); } sub st { map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -M] } glob('*'); }

    Here are the results of the benchmark:

    Benchmark: timing 100 iterations of chad, swartz... chad: 118 wallclock secs (56.91 usr + 47.25 sys = 104.16 CPU) @ + 0.96/s (n=100) swartz: 1515 wallclock secs (142.33 usr + 1044.36 sys = 1186.69 CP +U) @ 0.08/s (n=100)
        Yup, DOH!. Technically I don't even need the <=>. I could just take the last output of glob. Still en expensive way of doing it. After reviewing the input here it would definately be better to perform an ST on an array populated using opendir/readdir.

        Thanks for your input everybody.

      Methinks there is some kind of optimization going on. Perhaps perl knows that a sorted array in scalar (or void?) context is just the length of the array, and so the sort can be optimized away in the first case? Also, you really should take the glob out of the equation (it's an expensive operation and heavily skews the results of the benchmark). I get the ST being more than twice as fast with this (also you have '$a <=> $b' in one, and '$b <=> $a' in the other & BTW I have about 63 files in the directory):
      use Benchmark; opendir(DIR, ".") or die "Acck: $!"; my @files = readdir DIR; closedir DIR; my $num_files = @files; print "$num_files\n"; timethese(-4, { 'chad' => \&chad, 'swartz' => \&st,}); sub chad { my @list = sort{ (-M $b) <=> (-M $a) } @files; } sub st { my @list = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [$_, -M] } @files; }