It's more efficient to chdir into a directory and then name the files within that directory without having to specify a multiple step path. You can probably even see the difference in a carefully constructed benchmark.
Hence, using chdir is the default. It's the most efficient.
| [reply] |
You lost me there, when might it be less efficient to "specify a multiple step path" ?
I don't really use File::Find much, so I super searched for some example code to try benchmarking, and from what i can tell, all other factors being equal/irrelevant, no_chdir seems to be faster.
Here's an example from one of your snippets...
#!/usr/local/bin/perl
use Benchmark qw(cmpthese);
use File::Find;
my %results;
my $wanted = sub {
if (-l) { # it's a symlink
my ($dev, $ino) = lstat _; # reuse info from -l
push @{$results{"$dev $ino"}}, $File::Find::name;
if (-e) { # that points somewhere else
my ($dev, $ino) = stat _; # reuse info from -e
push @{$results{"$dev $ino"}}, "symlink:$File::Find::name";
}
} else {
my ($dev, $ino) = stat;
push @{$results{"$dev $ino"}}, $File::Find::name;
}
};
my @dirs = qw(/bin /usr/bin /usr/sbin); # change this to "/" to do the
cmpthese(1000, {
chdir => sub {
%results=();
find { wanted=>$wanted }, @dirs;
},
no_chdir => sub {
%results=();
find { wanted=>$wanted, no_chdir=>1}, @dirs;
}
});
__END__
laptop:~> monk.pl
Benchmark: timing 1000 iterations of chdir, no_chdir...
chdir: 79 wallclock secs (61.68 usr 13.90 sys + 1.23 cusr 1.03
+csys = 77.84 CPU) @ 13.23/s (n=1000)
no_chdir: 82 wallclock secs (61.94 usr 17.99 sys + 1.07 cusr 1.09
+csys = 82.09 CPU) @ 12.51/s (n=1000)
Rate no_chdir chdir
no_chdir 12.5/s -- -5%
chdir 13.2/s 6% --
| [reply] [d/l] |
That's not very deep, to go to /usr/bin/foo instead of using chdir first. What I meant was a deep hierarchy, like /usr/local/lib/X11/app_defaults/foo. Every path that has all those steps has to be looked up step-by-step, repeating the same work for the kernel over and over.
Admittedly, the cost is fairly cheap these days, since modern kernels can cache a lot of the intermediate directories. But it's still a non-zero cost, and while that might not make a difference for a dozen lookups, it will for a thousand lookups.
| [reply] |
The reason I ask is that a long time ago I used File::Find and I tried to limit the number of files found by dying after n found files. In that situation, the side effect became a problem.
Use Cwd to get the current directory before the call to File::Find. chdir back to that directory after you've finished.
File::Find already makes use of Cwd anyway, so there's no overhead of using that module yourself.
| [reply] |
| [reply] |
I'm just guessing but I highly suspect that chdir by default is there for hysterical reasons. Perl distro contains a script find2perl which can convert find command lines to equivalent Perl code. Generated perl uses File::Find. As this code should emulate find's behaviour File::Find has to do chdir.
| [reply] |
Well, the other day I was just thinking about how I keep
getting bitten on code like this:
# Find all sub-dirs of $dir
opendir DIR, $dir or die "yaddah:$!";
@subdirs = grep { -d } readdir(DIR);
This simple, obvious code doesn't do what I expect.
The reason is that I keep forgetting to explictly do
a chdir($dir): the "-d" is looking in the wrong place.
So myself, I've been wondering why an "opendir" doesn't do a
"chdir" for you...
You can do the recursive equivalent of that task using File::Find like so:
use File::Find;
find( sub {
-d && print $File::Find::name, "\n";
},
$dir );
Without the implicit "chdir" behavior, you'd need to change the "-d" line to:
-d $File::Find::name && print $File::Find::name, "\n";
| [reply] [d/l] [select] |
my @dirs= grep { -d } glob( "$dir/*" );
works. It forces you to either chdir or prepend the directory name (and often makes much simpler code).
You also forgot to filter out "." and ".." in your version. Of course, my version won't report ".cpan" (which might be a bug or a feature, depending), and I wish glob provided an option to make that trivial to do.
| [reply] [d/l] |