jplindstrom has asked for the wisdom of the Perl Monks concerning the following question:

I'm curious about a design decision of File::Find. The default behaviour of File::Find is to chdir into directories. But it can avoid that if you pass the no_chdir option.

I wonder if there is any reason for the default to be to chdir, as opposed to not doing it. Is it a 50/50 call, or is there a particular reason for it?

(The reason I ask is that a long time ago I used File::Find and I tried to limit the number of files found by dying after n found files. In that situation, the side effect became a problem.)

/J

Replies are listed 'Best First'.
•Re: Why does File::Find chdir?
by merlyn (Sage) on Jul 13, 2004 at 02:26 UTC
    It's more efficient to chdir into a directory and then name the files within that directory without having to specify a multiple step path. You can probably even see the difference in a carefully constructed benchmark.

    Hence, using chdir is the default. It's the most efficient.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      You lost me there, when might it be less efficient to "specify a multiple step path" ?

      I don't really use File::Find much, so I super searched for some example code to try benchmarking, and from what i can tell, all other factors being equal/irrelevant, no_chdir seems to be faster.

      Here's an example from one of your snippets...

      #!/usr/local/bin/perl use Benchmark qw(cmpthese); use File::Find; my %results; my $wanted = sub { if (-l) { # it's a symlink my ($dev, $ino) = lstat _; # reuse info from -l push @{$results{"$dev $ino"}}, $File::Find::name; if (-e) { # that points somewhere else my ($dev, $ino) = stat _; # reuse info from -e push @{$results{"$dev $ino"}}, "symlink:$File::Find::name"; } } else { my ($dev, $ino) = stat; push @{$results{"$dev $ino"}}, $File::Find::name; } }; my @dirs = qw(/bin /usr/bin /usr/sbin); # change this to "/" to do the cmpthese(1000, { chdir => sub { %results=(); find { wanted=>$wanted }, @dirs; }, no_chdir => sub { %results=(); find { wanted=>$wanted, no_chdir=>1}, @dirs; } }); __END__ laptop:~> monk.pl Benchmark: timing 1000 iterations of chdir, no_chdir... chdir: 79 wallclock secs (61.68 usr 13.90 sys + 1.23 cusr 1.03 +csys = 77.84 CPU) @ 13.23/s (n=1000) no_chdir: 82 wallclock secs (61.94 usr 17.99 sys + 1.07 cusr 1.09 +csys = 82.09 CPU) @ 12.51/s (n=1000) Rate no_chdir chdir no_chdir 12.5/s -- -5% chdir 13.2/s 6% --
        That's not very deep, to go to /usr/bin/foo instead of using chdir first. What I meant was a deep hierarchy, like /usr/local/lib/X11/app_defaults/foo. Every path that has all those steps has to be looked up step-by-step, repeating the same work for the kernel over and over.

        Admittedly, the cost is fairly cheap these days, since modern kernels can cache a lot of the intermediate directories. But it's still a non-zero cost, and while that might not make a difference for a dozen lookups, it will for a thousand lookups.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: Why does File::Find chdir?
by bart (Canon) on Jul 13, 2004 at 10:50 UTC
    The reason I ask is that a long time ago I used File::Find and I tried to limit the number of files found by dying after n found files. In that situation, the side effect became a problem.
    Use Cwd to get the current directory before the call to File::Find. chdir back to that directory after you've finished.

    File::Find already makes use of Cwd anyway, so there's no overhead of using that module yourself.

      At the time I was thoroughly confused by the behaviour, which forced me to read the man page a little better. The no_chdir option solved the problem for me nicely.

      (Changing directory there and back seems more brittle than just leaving it alone.)

      /J

Re: Why does File::Find chdir?
by IlyaM (Parson) on Jul 13, 2004 at 13:30 UTC
    I'm just guessing but I highly suspect that chdir by default is there for hysterical reasons. Perl distro contains a script find2perl which can convert find command lines to equivalent Perl code. Generated perl uses File::Find. As this code should emulate find's behaviour File::Find has to do chdir.

    --
    Ilya Martynov, ilya@iponweb.net
    CTO IPonWEB (UK) Ltd
    Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
    Personal website - http://martynov.org

Re: Why does File::Find chdir?
by doom (Deacon) on Jul 13, 2004 at 21:10 UTC
    Well, the other day I was just thinking about how I keep getting bitten on code like this:
    # Find all sub-dirs of $dir opendir DIR, $dir or die "yaddah:$!"; @subdirs = grep { -d } readdir(DIR);
    This simple, obvious code doesn't do what I expect. The reason is that I keep forgetting to explictly do a chdir($dir): the "-d" is looking in the wrong place. So myself, I've been wondering why an "opendir" doesn't do a "chdir" for you... You can do the recursive equivalent of that task using File::Find like so:
    use File::Find; find( sub { -d && print $File::Find::name, "\n"; }, $dir );
    Without the implicit "chdir" behavior, you'd need to change the "-d" line to:
    -d $File::Find::name && print $File::Find::name, "\n";

      This is one reason why I prefer glob to opendir:

      my @dirs= grep { -d } glob( "$dir/*" );

      works. It forces you to either chdir or prepend the directory name (and often makes much simpler code).

      You also forgot to filter out "." and ".." in your version. Of course, my version won't report ".cpan" (which might be a bug or a feature, depending), and I wish glob provided an option to make that trivial to do.

      - tye