mav3067 has asked for the wisdom of the Perl Monks concerning the following question:

I am getting ready to release a script I have written and am currently in the process of removing any of the system dependent coding practices I may have used due to time constraints. This includes using the following block of code:
foreach my $file (`find $dir -type f -name "*.txt"`)
My question is, what is the best method to replace this type of behavior? I have looked at 3 methods so far:

method 1: glob

advantages: built in to perl
disadvantages: ??

method 2: File::Util

advantages: gets both the files and the directories (although for this script I never need the list of directories)
disadvantages:
-does not seem to be a core perl module and I would like to keep the number of non-standard modules to a minimum for ease of use
- longer to code than using glob?

method 3: File::Find

advantages: seems to be a core perl module
disandvantages: ? - longer to code than using glob?

Any advice on what the best practice(s) are for finding files located in a single (known) directory would be great.

Thanks, Mav3067

Replies are listed 'Best First'.
Re: Glob vs File::Util vs File::Find
by Fletch (Bishop) on Jun 12, 2008 at 16:13 UTC

    glob is not recursive, so it's kind of like asking what kind of socket wrench you need to use instead of a screwdriver to drive in a screw (i.e. similar jobs but different kinds of tools). You can of course reimplement your own recursive routines on top of it but then you're reinventing the wheel that other modules already implement.

    While File::Find may seem "longer to code" it's not that bad; you may find File::Find::Rule more to your liking, but that will introduce an external dependency (albeit a pure Perl one so it's not that bad all things considered).

    Never used the other module you mention so I probably can't really comment one way or the other.

    Update: Let me nitpick myself and amend my first statement to glob is not recursive in and of itself but it will handle multiple directory levels if you give it a path with directory separators. If you really want to you can keep getting deeper and deeper with */*, */*/*, etc until you get back an empty list, but again that's reinventing wheels already solved better elsewhere.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Glob vs File::Util vs File::Find
by ikegami (Patriarch) on Jun 12, 2008 at 16:36 UTC

    The reason File::Find::Rule keeps coming up is that it would be the less work by far. For example,

    foreach my $file (`find $dir -type f -name "*.txt"`)

    would become

    foreach my $file (find->file()->name('*.txt')->in('.'))
Re: Glob vs File::Util vs File::Find
by kyle (Abbot) on Jun 12, 2008 at 16:14 UTC

    I'd use File::Find. It's a core module, so it will always be anywhere you find Perl. It takes more code to write, but it's still not that bad:

    sub find_text_files { my $dir = shift; my @text_files; my $tf_finder = sub { return if ! -f; return if ! /\.txt\z/; push @text_files, $File::Find::name; }; find( $tf_finder, $dir ); return @text_files; }

    (Not tested.)

    It would be a little easier using File::Find::Rule, but that's not core.

Re: Glob vs File::Util vs File::Find
by toolic (Bishop) on Jun 12, 2008 at 16:23 UTC
    The Unix find utility recursively searches through the directory hierarchy; the Perl glob built-in does not.

    The Perl core module, File::Find, is similar to Unix find in that it does a recursive search.

    You are right: as of 5.10.0, File::Util is not a core module.

    Another module to consider is File::Find::Rule, which has a simpler interface, IMO, than File::Find, although it is not a core module.

    The answer might also depend on what you plan to do with the list of files, once you have that list.

Re: Glob vs File::Util vs File::Find
by pc88mxer (Vicar) on Jun 12, 2008 at 17:05 UTC
    I don't have too much more to add except that I laud your decision to replace the use of backticks with an alternate approach. There are just too many pitfalls involved with calling out to an external command and interpreting its output. In particular, using backticks with string interpolation like you are doing is especially problematic. What happens, for example, if $dir contains a space?

    If you have other uses of backticks in your program, I would strongly encourage you to replace them with more robust solutions. I wish I could point you to a good writeup of the best practices in this regard, but unfortunately I don't know of any right now.

Re: Glob vs File::Util vs File::Find
by mav3067 (Beadle) on Jun 12, 2008 at 16:40 UTC
    Thank you all for your replies.

    I just want to clarify, that initially when I was using the Unix find I needed it to be recursive, but have since eliminated the need for a recursive find. Does this make glob more appropriate?

    Cheers, Mav
Re: Glob vs File::Util vs File::Find
by chrism01 (Friar) on Jun 13, 2008 at 06:18 UTC
    Re "finding files located in a single (known) directory "
    Seems to me you could just use the opendir() built-in, possibly with chdir().

      A related question: Has anyone benchmarked these against one another?

      I am considering using either File::Util or File::Find to recursively walk a directory tree pulling out file metadata (size, mtime, etc.) with stat(), but I need to use the fastest possible method. Anyone know from previous experience which one that would be?

      Thanks,
      - Zen