Re: how to list the files in dir with respect to time
by Zaxo (Archbishop) on Aug 02, 2005 at 02:21 UTC
|
You could sort by mtime from stat or, better, apply List::Util::reduce() to the filename list. We'll insert a Schwartzian Transform-like mapping to reduce the number of stat calls.
use List::Util 'reduce';
my $newest = (
reduce { $a->[1] < $b->[1] ? $b : $a }
map { [ $_, (stat)[9] ] }
glob '/path/to/*.log'
)->[0];
Untested.
| [reply] [d/l] |
|
|
The List::Util stuff is definitely cool, but I don't quite understand why you say it reduces the number of stat calls. I would have thought that you need to stat every candidate file just once, whether you use List::Util 'reduce' or something like the more mundane hash:
my %file_date;
for ( glob '/path/to/*.log' ) {
$file_date{$_} = (stat)[9];
}
my $newest = ( sort { $file_date{$b} <=> $file_date{$a} } keys %file_a
+ge )[0];
(update: changed name of hash to be consistent with what it stores) | [reply] [d/l] |
|
|
use List::Util 'reduce';
my $newest = reduce {
(stat $a)[9] < (stat $b)[9] ? $b : $a
} glob '/path/to/*.log';
It's not possible to use the *_ handle for a cached stat call there, because we don't know that the currently reduced name was the subject of the most recent stat call.
Your hash approach is fine, too, but it would also benefit from reduce() over sort.
| [reply] [d/l] |
|
|
Thanks zaxo! You beat me to the punch :)
I was confused too why zaxo said it will reduce number of stat calls. I am yet to familiarize myself with ST so i will read up on that before I ask any questions regarding that :)
Seems like the OP wanted the last modified file. So why store all file and then sort instead of a simple max?
Modifying graff's code here
Update : Added logic to update $max. Thanks graff. my oversight.
my %file_date;
my $max = -99999999; # set it to first file's mtime would be better
# but just for demonstration here
my $mtime;
my $file;
for ( glob '/path/to/*.log' ) {
$mtime = (stat)[9];
if ($max <= $mtime) {
$file = $_;
$max = $mtime;
}
print ("file = $file\n");
# my $newest = ( sort { $file_date{$b} <=> $file_date{$a} } keys %file
+_age )[0];
| [reply] [d/l] |
|
|
|
|
|
|
Regarding OP's question about finding the modification time, in addition to stat, also worth knowing about File::stat for a more readable wrapper and not having to remember or lookup that 9 means mtime.
map { [ $_, stat($_)->mtime ] }
This is, of course, an efficiency/readability tradeoff, but this may be acceptable in many cases.
OP may want to see Understanding transformation sorts (ST, GRT), the details if unfamiliar with the Schwartzian Transform<./p>
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] |
Re: how to list the files in dir with respect to time
by greenFox (Vicar) on Aug 02, 2005 at 05:11 UTC
|
If all you want is the very latest file then a straight out compare as you traverse the file list will do...
my $latest;
$latest->{mtime} = 0;
for (glob '/path/to/logs/*.log' ) {
next if -d;
my $mtime = (stat)[9];
if ($mtime > $latest->{mtime}){
$latest->{filename} = $_;
$latest->{mtime} = $mtime;
}
}
print $latest->{filename}, "\n";
Update: I see sk already posted this...
-- Murray Barton Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho
| [reply] [d/l] |
Re: how to list the files in dir with respect to time
by 5mi11er (Deacon) on Aug 02, 2005 at 14:43 UTC
|
I appreciate that swaroop prefers not to use Unix commands, and I've also noticed that using the stat operations should be platform agnostic provided that platform is supported in perl, where a Unix command would be tied to a Unix variant and would likely not work under, say, Windows (without cygwin or other variants).
However, given a restraint that says we will be used under Unix, is the overhead of a system call worse than stat'ing each and every file, which must at some point be converted to a system call as well? I would think that the `ls -lrt` option would have to be a more CPU time efficient operation. Yeah, I should run a benchmark. I've resisted this for a long time, but there's no time like the present... I'll post an update when I get done; unless someone beats me to it. -Scott Update:
Rate graff_hash zaxo_first many_stats sk_maxfile the_ls_lrt
graff_hash 25.3/s -- -14% -21% -40% -84%
zaxo_first 29.3/s 16% -- -9% -31% -81%
many_stats 32.3/s 27% 10% -- -24% -79%
sk_maxfile 42.4/s 67% 45% 31% -- -73%
the_ls_lrt 155/s 513% 430% 381% 266% --
Ok, I think something's off, I'd expect the "many_stats" to be the worst performing option, but it turns out to be not so bad. This is run on our /usr/bin directory with 1643 files... | [reply] [d/l] |
|
|
Ok, after sleeping on it, and re-examining the code, I've got a better handle on the benchmarks.My first mistake was not fully reading the "many_stats" routine; it is using the List::Util::reduce routine, not the sort routine as I'd assumed/glossed over. So, I added a test that did use the worst non-contrived combination of stat and sort, and THAT gave me the results I expected.
Rate most_stats graff_hash zaxo_first more_stats sk_maxfile the_ls_lrt
most_stats 4.64/s -- -82% -84% -86% -89% -97%
graff_hash 25.6 /s 452% -- -14% -22% -41% -82%
zaxo_first 29.8 /s 543% 16% -- -9% -31% -79%
more_stats 32.9 /s 608% 28% 10% -- -24% -76%
sk_maxfile 43.3 /s 832% 69% 45% 32% -- -69%
the_ls_lrt 140 /s 2909% 445% 368% 325% 223% --
Doing two stats for every compare in the sort routine is REALLY bad, and the next worst option is graff's caching the file dates in a hash, then sorting.Then we have the two attempts using List::Util::reduce; I don't quite understand how zaxo's caching of dates can be worse than stat'ing during each compare. My only guess would have to be the setup of the 2 dimensional array and all the dereferencing going on creates enough of a penalty that they out-weigh the stat calls. Then we see that sk's function to pull only the newest file out as we're going through the array is a bit better than the reduce options. And finally letting the system's 'ls' routine do most of the work for us is far and away the best option (ignoring portability issues). Code follows:
-Scott | [reply] [d/l] |