swaroop has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I need to open the directory and need to process the current log file for analyzing the contents. So, needs to get the last modified file in that directory.

We can use dirhandle or opendir. However, it lists according to the naming conversion. In addition, I personally do not like to use any UNIX commands like ... "@array = `ls -lrt`".

Any modules to do this work?

Help!!!

Thanks,
Swaroop
  • Comment on how to list the files in dir with respect to time

Replies are listed 'Best First'.
Re: how to list the files in dir with respect to time
by Zaxo (Archbishop) on Aug 02, 2005 at 02:21 UTC

    You could sort by mtime from stat or, better, apply List::Util::reduce() to the filename list. We'll insert a Schwartzian Transform-like mapping to reduce the number of stat calls.

    use List::Util 'reduce'; my $newest = ( reduce { $a->[1] < $b->[1] ? $b : $a } map { [ $_, (stat)[9] ] } glob '/path/to/*.log' )->[0];
    Untested.

    After Compline,
    Zaxo

      The List::Util stuff is definitely cool, but I don't quite understand why you say it reduces the number of stat calls. I would have thought that you need to stat every candidate file just once, whether you use List::Util 'reduce' or something like the more mundane hash:
      my %file_date; for ( glob '/path/to/*.log' ) { $file_date{$_} = (stat)[9]; } my $newest = ( sort { $file_date{$b} <=> $file_date{$a} } keys %file_a +ge )[0];

      (update: changed name of hash to be consistent with what it stores)

        The map reduces the number of stat calls by half. Compare to:

        use List::Util 'reduce'; my $newest = reduce { (stat $a)[9] < (stat $b)[9] ? $b : $a } glob '/path/to/*.log';

        It's not possible to use the *_ handle for a cached stat call there, because we don't know that the currently reduced name was the subject of the most recent stat call.

        Your hash approach is fine, too, but it would also benefit from reduce() over sort.

        After Compline,
        Zaxo

        Thanks zaxo! You beat me to the punch :)

        I was confused too why zaxo said it will reduce number of stat calls. I am yet to familiarize myself with ST so i will read up on that before I ask any questions regarding that :)

        Seems like the OP wanted the last modified file. So why store all file and then sort instead of a simple max?

        Modifying graff's code here

        Update : Added logic to update $max. Thanks graff. my oversight.

        my %file_date; my $max = -99999999; # set it to first file's mtime would be better # but just for demonstration here my $mtime; my $file; for ( glob '/path/to/*.log' ) { $mtime = (stat)[9]; if ($max <= $mtime) { $file = $_; $max = $mtime; } print ("file = $file\n"); # my $newest = ( sort { $file_date{$b} <=> $file_date{$a} } keys %file +_age )[0];

      Regarding OP's question about finding the modification time, in addition to stat, also worth knowing about File::stat for a more readable wrapper and not having to remember or lookup that 9 means mtime.

      map { [ $_, stat($_)->mtime ] }

      This is, of course, an efficiency/readability tradeoff, but this may be acceptable in many cases.

      OP may want to see Understanding transformation sorts (ST, GRT), the details if unfamiliar with the Schwartzian Transform<./p>

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: how to list the files in dir with respect to time
by greenFox (Vicar) on Aug 02, 2005 at 05:11 UTC

    If all you want is the very latest file then a straight out compare as you traverse the file list will do...

    my $latest; $latest->{mtime} = 0; for (glob '/path/to/logs/*.log' ) { next if -d; my $mtime = (stat)[9]; if ($mtime > $latest->{mtime}){ $latest->{filename} = $_; $latest->{mtime} = $mtime; } } print $latest->{filename}, "\n";

    Update: I see sk already posted this...

    --
    Murray Barton
    Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Re: how to list the files in dir with respect to time
by 5mi11er (Deacon) on Aug 02, 2005 at 14:43 UTC
    I appreciate that swaroop prefers not to use Unix commands, and I've also noticed that using the stat operations should be platform agnostic provided that platform is supported in perl, where a Unix command would be tied to a Unix variant and would likely not work under, say, Windows (without cygwin or other variants).

    However, given a restraint that says we will be used under Unix, is the overhead of a system call worse than stat'ing each and every file, which must at some point be converted to a system call as well? I would think that the `ls -lrt` option would have to be a more CPU time efficient operation.

    Yeah, I should run a benchmark. I've resisted this for a long time, but there's no time like the present... I'll post an update when I get done; unless someone beats me to it.

    -Scott

    Update:

    
                 Rate graff_hash zaxo_first many_stats sk_maxfile the_ls_lrt
    graff_hash 25.3/s         --       -14%       -21%       -40%       -84%
    zaxo_first 29.3/s        16%         --        -9%       -31%       -81%
    many_stats 32.3/s        27%        10%         --       -24%       -79%
    sk_maxfile 42.4/s        67%        45%        31%         --       -73%
    the_ls_lrt  155/s       513%       430%       381%       266%         --
    
    Ok, I think something's off, I'd expect the "many_stats" to be the worst performing option, but it turns out to be not so bad. This is run on our /usr/bin directory with 1643 files...
      Ok, after sleeping on it, and re-examining the code, I've got a better handle on the benchmarks.

      My first mistake was not fully reading the "many_stats" routine; it is using the List::Util::reduce routine, not the sort routine as I'd assumed/glossed over. So, I added a test that did use the worst non-contrived combination of stat and sort, and THAT gave me the results I expected.

                   Rate   most_stats  graff_hash  zaxo_first  more_stats  sk_maxfile  the_ls_lrt
      most_stats   4.64/s       --         -82%        -84%        -86%        -89%        -97%
      graff_hash  25.6 /s      452%         --         -14%        -22%        -41%        -82%
      zaxo_first  29.8 /s      543%         16%         --          -9%        -31%        -79%
      more_stats  32.9 /s      608%         28%         10%         --         -24%        -76%
      sk_maxfile  43.3 /s      832%         69%         45%         32%         --         -69%
      the_ls_lrt 140   /s     2909%        445%        368%        325%        223%         --
      
      Doing two stats for every compare in the sort routine is REALLY bad, and the next worst option is graff's caching the file dates in a hash, then sorting.

      Then we have the two attempts using List::Util::reduce; I don't quite understand how zaxo's caching of dates can be worse than stat'ing during each compare. My only guess would have to be the setup of the 2 dimensional array and all the dereferencing going on creates enough of a penalty that they out-weigh the stat calls.

      Then we see that sk's function to pull only the newest file out as we're going through the array is a bit better than the reduce options. And finally letting the system's 'ls' routine do most of the work for us is far and away the best option (ignoring portability issues).

      Code follows:

      -Scott