in reply to Simple Recursion

Corion and cdarke have corrected some important problems with your code, and their advice will probably lead you to find related problems.

Here's a cleaned up version that should work; it provides a few extras and some changes that you might find useful, in addition to possibly speeding things up, making it more flexible, etc.

use strict; use Carp; sub readin_dir { my ( $filenames, $paths, $ext ) = @_; if ( ref($filenames) ne 'ARRAY' or ref($paths) ne 'ARRAY' or @$pat +hs == 0 ) { carp( "readin_dir call lacks array_ref(s) for file names and/o +r paths\n" ); return; } for my $path ( @$paths ) { $path =~ s{/+$}{}; # don't need or want trailing slash(es) opendir( my $dh, $path ) or do { warn "readin_dir: open failed for $path\n"; next } +; my @subdirs = (); while ( my $file = readdir( $dh )) { next if ( $file =~ /^\.{1,2}$/ ); if ( -d "$path/$file" ) { push @subdirs, "$path/$file"; } elsif ( $ext eq '' or $file =~ /\.$ext$/ ) { push @$filenames, "$path/$file"; } } closedir( $dh ); readin_dir( $filenames, \@subdirs, $ext ) if ( @subdirs ); } }
Some points worth noting: The traditional PerlMonks answer to this kind of directory-search problem is to use File::Find or one of its several variants, and in your case, it might be worthwhile to look into that. The API's for those modules tend to be a bit strange, and you may just end up staying with the recursive function, but you might appreciate the extra power and flexibility that the modules can provide, e.g. for dealing with symbolic links on *nix or macosx -- then again, if you're on one of those systems, using the regular unix "find" utility can be even easier, and will most likely be faster.

Here's a little benchmark that compares the recursive function against both File::Find and a unix "find" command being opened as a file handle. The recursive function came out slowest for me, taking about twice as long as unix "find"; File::Find ended up surprisingly close to (not so much slower than) unix "find" in my case (perl 5.8.8 on macosx).

(One difference I noticed was that the recursive sub ended up following symbolic links that caused it to count some files twice, whereas File::Find only counted files once, and unix find -- given its default usage -- did not follow symlinks at all. I'm a bit dismayed at having to use global variables inside the File::Find "wanted" function, but apart from that, it does "the right thing" reasonably well.)

#!/usr/bin/perl use strict; use Benchmark; use File::Find (); my @found; my $ext = ''; if ( @ARGV >= 2 and $ARGV[0] eq '-e' ) { shift; $ext = shift; } my @paths = ( @ARGV ) ? @ARGV : ( "." ); die "Usage: $0 [-e ext] path ...\n" unless ( -d $paths[0] ); timethese( 10, { '2File::Find' => \&try_File_Find, '1Readin_dir' => \&try_readin_dir, '0Unix_find' => \&try_unix_find, } ); sub try_File_Find { @found = (); File::Find::find( { wanted => \&wanted, # follow_fast => 1, }, @paths ); print "File::Find found ".scalar @found." matches:\n"; } sub try_readin_dir { @found = (); readin_dir( \@found, \@paths, $ext ); print "readin_dir found ".scalar @found." matches\n"; } sub try_unix_find { @found = (); my $cmd = "find @paths -type f"; open( FIND, "-|", $cmd ); while (<FIND>) { chomp; push @found, $_ if ( $ext eq '' or /\.$ext$/ ); } print "unix_find found ".scalar @found." matches\n"; #, join( "\n" +,@found,"" ); } sub wanted { push @found, $File::Find::name if ( $ext eq '' or /\.$ext$/ ); } sub readin_dir { my ( $filenames, $paths, $ext ) = @_; for my $path ( @$paths ) { $path =~ s{/+$}{}; # don't need or want trailing slash(es) opendir( my $dh, $path ) or do { warn "readin_dir: open failed + for $path\n"; next }; my @subdirs = (); while ( my $file = readdir( $dh )) { next if ( $file =~ /^\.{1,2}$/ ); if ( -d "$path/$file" ) { push @subdirs, "$path/$file"; } elsif ( $ext eq '' or $file =~ /\.$ext$/ ) { push @$filenames, "$path/$file"; } } closedir( $dh ); readin_dir( $filenames, \@subdirs, $ext ) if ( @subdirs ); } }

Replies are listed 'Best First'.
Re^2: Simple Recursion
by mcsonka (Initiate) on Dec 03, 2007 at 23:51 UTC
    sorry, for the time waiting with this reply; I want to say thanks for all your tips and help. btw. I actually know about File::Find. But to my shame I must admit not to get it running so I tended to use the classical way by doing it myself.