SkipHuffman has asked for the wisdom of the Perl Monks concerning the following question:

Basically I need to access files in a directory tree by modification date. Most recent first.

I want to provide a base directory and get an array of filenames (or a sorted list, or something else I can massage into what I need).

Edit: I do not need ALL of the files from the list. Most likely I need the most recent. If not that, then the next most recent, and so on.

I am certain someone, somewhere has made a module to do this, but I am not successfully finding it.

Please help, wise ones.

  • Comment on How can I access a cross directory set of data files by most recently modified date?

Replies are listed 'Best First'.
Re: How can I access a cross directory set of data files by most recently modified date?
by ikegami (Patriarch) on Jun 07, 2007 at 17:19 UTC

    From the CB:

    In my case I may not need to sort the entire list. I need to get the most recent file, then I may need the next most recent file, and perhaps the next one after that.

    If most of the time, the latest satisfies your need, you could do a linear search for it. Then do something more expensive a real sort if you need further info.

    my $newest_time; my @newest_files; my %times; foreach my $file (@files) { my $time = (stat($file))[9]; $times{$file} = $time; if ($time > $newest_time) { $newest_time = $time; @newest_files = $file; } elsif ($time == $newest_time) { push @newest_files, $file; } } foreach my $file (@newest_files) { if ( ...[ this is the file we want ]... ) { return $file; } } @files = sort { $times{$b} <=> $times{$a} } @files; # Remove the ones we already checked. splice(@files, 0, scalar(@newest_files)); foreach my $file (@files) { if ( ...[ this is the file we want ]... ) { return $file; } }

    You might want to benchmark against sorting the whole list.

    my %times; $times{$_} = (stat($file))[9] foreach @files; @files = sort { $times{$b} <=> $times{$a} } @files;

    Update: Fixed problems raised by shmem.

      ikegami,

      seems like you're slacking a bit.

      my $time = -d $file;

      Huh? directory test to get the time? or is it that the variable $time is spectacularily ill-chosen?

      if ($time > $newest_time) { $newest = $time;

      use strict; use warnings... :-)

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: How can I access a cross directory set of data files by most recently modified date?
by moritz (Cardinal) on Jun 07, 2007 at 16:28 UTC

      I thought of that, but it seems kind of brute force for something someone probably has solved more elegantly.

        You can use a schwartzian transform to make it more efficient by only doing the stat call once per file, but other than that there's not that much more to it.

        Update: For those playing along at home that'd be @files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (stat $_)[9] ] } @files;

        Well, using a binary heap as a data structure might be more efficient, especially if you know you know that you need only the the knewest $k files, then all insertion and deletion operations will be in O(log $k) operations.

        But if you are talking about a more elegant interface: I don't know any :(

Re: How can I access a cross directory set of data files by most recently modified date?
by kyle (Abbot) on Jun 07, 2007 at 17:20 UTC

    If you have a huge number of files, Heap may be the way to go. According to my college algorithms book, building a heap from an unordered array is O(n) time. After that, you can pull the top element off and rebuild the heap in O(log n) time. In this way, you can get the "top n" elements without having to go through a full sort (which is O(n log n)).

    If there isn't a huge number of files, a Schwartzian Transform is probably the way to go.

    It would be interesting to Benchmark these two and see where the cut-off is. How big does the array have to be before the greater efficiency of the heap beats the greater efficiency of built-in sort written in C?

Re: How can I access a cross directory set of data files by most recently modified date?
by shmem (Chancellor) on Jun 07, 2007 at 19:28 UTC
    You could make an array of files and one of timestamps with fixed size, and use those to hold the newest files and timestamps, unshifting, popping and splicing:
    #!/usr/bin/perl use File::Find; use strict; die "usage: $0 n dir [dir ...]\nwhere n is number of items to report\n +" unless @ARGV; my $n = (shift) - 1; my @d = @ARGV; my @f; my @t; $#f = $n; # preallocate $n elements $#t = $n; # for these arrays find(\&wanted, @d); print join("\n", map { scalar(localtime((stat $_)[9])) . " $_" } @f ),"\n"; sub wanted { my $file = $File::Find::name; return if -d $file; my $time = (stat($file))[9]; # mtime # if file is newer or as new than the first file... if ($t[0] < $time) { unshift @t, $time; unshift @f, $file; pop @t, pop @f if $#t > $n; return; } # ...else insert the found file in the list. for( my $i = 1; $i<= $#t; $i++) { if($time >= $t[$i]) { # oops splice @t,$i,1, $time; # oops splice @f,$i,1, $file; splice @t,$i,0, $time; splice @f,$i,0, $file; pop @t, pop @f if $#t > $n; return; } } }

    SkipHuffman++ - seems like I needed that utility :-)

    update: it has to be splice @t,$i,0, $time; certainly, not 1 ...

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: How can I access a cross directory set of data files by most recently modified date?
by thezip (Vicar) on Jun 07, 2007 at 19:27 UTC

    Here's yet another WTDI:

    This solution maintains a queue/array of size N (ie. store the names of the newest 5 files)

    Linearly traverse the directory structure Compare timestamp of current file to timestamp of oldest/last item i +n queue If file is newer Seek (the sorted) insertion point in the queue, and insert via spl +ice If queue size > N pop off the oldest element from the queue Repeat until finished...

    When finished, you'll have a sorted array containing the newest 5 files

    Update: BTW, you'll need to save the appropriate stat info for each queue element


    Where do you want *them* to go today?