in reply to Re: how to list the files in dir with respect to time
in thread how to list the files in dir with respect to time

Ok, after sleeping on it, and re-examining the code, I've got a better handle on the benchmarks.

My first mistake was not fully reading the "many_stats" routine; it is using the List::Util::reduce routine, not the sort routine as I'd assumed/glossed over. So, I added a test that did use the worst non-contrived combination of stat and sort, and THAT gave me the results I expected.

             Rate   most_stats  graff_hash  zaxo_first  more_stats  sk_maxfile  the_ls_lrt
most_stats   4.64/s       --         -82%        -84%        -86%        -89%        -97%
graff_hash  25.6 /s      452%         --         -14%        -22%        -41%        -82%
zaxo_first  29.8 /s      543%         16%         --          -9%        -31%        -79%
more_stats  32.9 /s      608%         28%         10%         --         -24%        -76%
sk_maxfile  43.3 /s      832%         69%         45%         32%         --         -69%
the_ls_lrt 140   /s     2909%        445%        368%        325%        223%         --
Doing two stats for every compare in the sort routine is REALLY bad, and the next worst option is graff's caching the file dates in a hash, then sorting.

Then we have the two attempts using List::Util::reduce; I don't quite understand how zaxo's caching of dates can be worse than stat'ing during each compare. My only guess would have to be the setup of the 2 dimensional array and all the dereferencing going on creates enough of a penalty that they out-weigh the stat calls.

Then we see that sk's function to pull only the newest file out as we're going through the array is a bit better than the reduce options. And finally letting the system's 'ls' routine do most of the work for us is far and away the best option (ignoring portability issues).

Code follows:

use strict; use warnings; use List::Util 'reduce'; use Benchmark qw(cmpthese); our $path = shift || '/usr/bin/*'; sub badsort { my @list = sort { (stat $b)[9] <=> (stat $a)[9] } glob "$path"; my $newest = $list[0]; } sub goodsort { my %file_date; for ( glob "$path" ) { $file_date{$_} = (stat)[9]; } my $newest = ( sort { $file_date{$b} <=> $file_date{$a} } keys %fi +le_date )[0]; } sub badreduce { my $newest = reduce { (stat $a)[9] < (stat $b)[9] ? $b : $a } glob "$path"; } sub goodreduce { my $newest = ( reduce { $a->[1] < $b->[1] ? $b : $a } map { [ $_, (stat)[9] ] } glob "$path" )->[0]; } sub d { my %file_date; my $max = -99999999; # set it to first file's mtime would be bette +r # but just for demonstration here my $mtime; my $file; for ( glob "$path" ) { $mtime = (stat)[9]; if ($max <= $mtime) { $file = $_; $max = $mtime; } } my $newest = $file; } sub e { my @array = `ls -lrt $path`; my $newest = $array[-1]; } cmpthese(250, { zaxo_first => \&goodreduce, graff_hash => \&goodsort, more_stats => \&badreduce, most_stats => \&badsort, sk_maxfile => \&d, the_ls_lrt => \&e, });
-Scott
  • Comment on Re^2: how to list the files in dir with respect to time - benchmarks
  • Download Code