Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

File::Find

by Corion (Patriarch)
on Sep 16, 2000 at 12:48 UTC ( [id://32791]=modulereview: print w/replies, xml ) Need Help??

Item Description: Enumerate files and directories in a directory tree

Review Synopsis: Use this module instead of globbing or readdir()

File::Find is the way if you want to look at all files in one or more directories. File::Find exports one function, find(), which takes two parameters, a hash or a code reference, and a list of directories where the search starts.

Why use File::Find

File::Find protects you from a lot of nasty things that happen on filesystems. In its standard configuration it ensures that your code reference is called once for each file encountered, even if there are more symlinks pointing to it, and it also prevents nasty loops for symlinked directories.

Why avoid File::Find

There is not much reason to avoid File::Find - you could want to avoid it if you want to read files in a single directory, without recursing, when you are explicitly sure that there can be no symlinks in that directory (for example, if the filesystem dosen't allow symlinks). Then, your code could load faster. But I'd file that under premature optimization.

Caveats

If you are starting to first use File::Find, you have to deal with some idiosyncrasies.

First of all, File::Find uses some "optimization" by default to speed up searches under certain filesystems under Unix. Unfortunately, this "optimization" fails to work under other filesystems, such as the iso9600 filesystem used for cdroms. ncw tells you below what to do about it - in fact, you should always use the code ncw proposes.

In the default configuration, the directory is changed to the recursed directory, and all returned filenames are relative to the current directory. Use $File::Find::name to get a fully specified filename.

If you don't want to recurse below a certain directory, there is the (not-so-well-documented) $File::Find::prune variable, which you can set to 1 in your code reference to stop recursing into the current directory.

Examples

By popular demand, here are some examples on how to use the module. The documentation shows off some interesting code, but it's not helpful if you're looking for something to get started.

A first example, printing the filename and the filename with the path to the file. The code was stolen from a node by nate.

use strict; use File::Find; sub eachFile { my $filename = $_; my $fullpath = $File::Find::name; #remember that File::Find changes your CWD, #so you can call open with just $_ if (-e $filename) { print "$filename exists!" . " The full name is $fullpath\n"; } } find (\&eachFile, "mydir/");

Replies are listed 'Best First'.
RE: File::Find
by ncw (Friar) on Sep 16, 2000 at 15:01 UTC
    I've found another small problem with File::Find. If you are on a unix based platform, and you use File::Find on a non-unix partition eg dos, vfat (win9x), is9660 (cdrom) and AFS then File::Find doesn't work properly. You need to add
    $File::Find::dont_use_nlink=1
    Into your program and it will work fine. As I understand it this lowers the efficiency of File::Find because it has to look into each directory to see if there are entries in it rather than just looking at the nlink field in the inode. Non unix filesystems (and AFS ;-) don't set this properly.

    This note is based on my experience with File::Find under Linux. It is probably similar under other *nix based systems but since foreign partition mounting is an OS specific thing YMMV.

      Actually the hack that dont_use_nlink disables only speeds up the most basic find operations where you don't care anything about the files to be found and then don't do anything with them. And it breaks File::Find on every platform I've ever used it on, just not on every file system on every platform.

      The default should be made to not use this bad hack on any platform. The days of "most file systems of most Unix systems" supporting this hack have long since passed (it has been a long time since I've seen a Unix system without a CD-ROM drive, just to pick one example).

      Sure, it was a cool hack a long time ago. And it can make just getting a listing of files much, much faster (depending on how your directories are structured). But a good module should err on the side of giving correct results over performance.

              - tye (but my friends call me "Tye")
      If you find an architecture on which you need to set this manually, please report it to the developers. This is supposed to be set automatically on those system which need it. If it isn't, it's a bug.

      -- Randal L. Schwartz, Perl hacker

        Here is my test prog :-
        use strict; use File::Find; my $dir = shift; my ($with, $without); print "Counting files in $dir\n"; # This is the default on Linux $File::Find::dont_use_nlink = 0; find(sub { $without++ }, $dir); $File::Find::dont_use_nlink = 1; find(sub { $with++ }, $dir); print "With \$File::Find::dont_use_nlink = 0: $without files found\n"; print "With \$File::Find::dont_use_nlink = 1: $with files found\n";
        I ran this on a mounted iso9660 disc like this (note if the disc has RockRidge extensions then it works properly!) :-
        $ ./file_find_test.pl /mnt/cdrom 
        Counting files in /mnt/cdrom
        With $File::Find::dont_use_nlink = 0: 29 files found
        With $File::Find::dont_use_nlink = 1: 1300 files found
        
        This was on Linux 2.2.17 with perl 5.00503 with the standard File::Find that comes with the distribution

        I agree with tye's comment here - $File::Find::dont_use_nlink should be 1 on all platforms - the slowdown isn't worth the incompatibilities.

Re: File::Find
by larryl (Monk) on Mar 14, 2001 at 01:13 UTC

    I find myself using File::Find more and more now that I've got the hang of it. Typically you set up like so:

    use File::Find; find( \&do_stuff, $from_dir );
    and do_work() is the place where all the real work gets done.

    A couple caveats that I've found (the hard way...) about what you can do inside do_work():

    • Don't change $_ inside do_work(). If you want to, save a copy on entry and change it back before returning.
    • As Corion mentions, the working directory is changed to each recursed directory under your starting point. If you change directories inside do_work(), save a copy of the current directory on entry and chdir back to it before returning.
    • The usual file test operator caveats apply, for example -f $File::Find::name and -l $File::Find::name are both true if the file is a symlink to another file. If you're interested in symbolic links, test for those first, before you test for file- or directory-ness.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: modulereview [id://32791]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 16:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found