Hi,
I'm trying to check if a file is in a directory with a lot of files (about 15 000). This is done often and I can't keep my filehandle or cache the result so I need to check the dir from scratch everytime.
The script isn't originally done by me and uses `ls ...` which I figured should be slower than opendir() but it seems as it's not. Opendir beats ls on directories containing a small amount of files but slower on directories with many files. Can anyone explain to me how shelling out can be quicker than opendir and grep?
What I find really confusing is that reading/rewinding an already open directory isn't much faster than open/close.
The output i get is:
jmo@foo:~> ls /some/small/dir|wc -l
322
jmo@foo:~> ls /some/large/dir|wc -l
12337
jmo@foo:~> perl ls.pl
Benchmark: timing 5000 iterations of allready opendir on small dir, ls
+ on small dir, opening dir on small dir...
allready opendir on small dir: 3 wallclock secs ( 2.83 usr + 0.50 sy
+s = 3.33 CPU) @ 1501.50/s (n=5000)
ls on small dir: 30 wallclock secs ( 0.60 usr 2.48 sys + 11.90 cusr 1
+6.85 csys = 31.83 CPU) @ 1623.38/s (n=5000)
opening dir on small dir: 4 wallclock secs ( 3.31 usr + 0.61 sys =
+3.92 CPU) @ 1275.51/s (n=5000)
Benchmark: timing 5000 iterations of allready opendir on large dir, ls
+ on large dir, opening dir on large dir...
allready opendir on large dir: 102 wallclock secs (87.82 usr + 13.04 s
+ys = 100.86 CPU) @ 49.57/s (n=5000)
ls on large dir: 57 wallclock secs ( 0.51 usr 1.99 sys + 27.62 cusr 2
+7.09 csys = 57.21 CPU) @ 2000.00/s (n=5000)
opening dir on large dir: 101 wallclock secs (88.11 usr + 12.37 sys =
+100.48 CPU) @ 49.76/s (n=5000)
The code I run look like this:
use Benchmark;
use strict;
my $dir = '/some/small/dir';
my $file = 'somefile*';
opendir (F, $dir);
timethese(5000, {'ls on small dir' => \&foo, 'allready opendir on smal
+l dir' => \&bar, 'opening dir on small dir' => \&baz});
$dir = '/some/large/dir';
$file = 'somefile*';
print "\n\n\n";
closedir F;
opendir (F, $dir);
timethese(5000, {'ls on large dir' => \&foo, 'allready opendir on larg
+e dir' => \&bar, 'opening dir on large dir' => \&baz});
closedir F;
sub foo
{
my $a = `ls $dir/$file &>/dev/null`;
}
sub bar
{
my (@files) = (grep (/$file/, readdir(F)));
rewinddir (F);
}
sub baz
{
opendir (DIR, $dir);
my (@files) = (grep (/$file/, readdir(DIR)));
closedir F;
}