mcsonka has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks I'm using that code for doing a recursive file extraction:
sub readin_directory{ if (defined $_[0]){ my @FILEPATHS = @{$_[0]}; #Argument must be a file with nothin +g without an absolute filename each row my @LIST = (); my @FILELIST = (); foreach my $FILEPATH (@FILEPATHS){ print("\nCurrent Filepath: $FILEPATH\n"); #Open file, store every line in an array index position an +d close file opendir (DIR, $FILEPATH) or die "Couldn't open given direc +tory!"; #Open directory @LIST = readdir(DIR); closedir(DIR) or die "Couldn't close directory!"; #Close d +irectory #Test whether file is an xml-file what is compulsory foreach my $FILE (@LIST){ #Recursively invoke function if current file is a dire +ctory #print("Current File: $FILE\n"); if(-d $FILE){ #Don't wanna have current directory and the one be +low next if $FILE =~ /^./; #print("Current Directory: $FILE\n"); push(@FILELIST,@{&readin_directory($FILE)}); + } elsif ($FILE =~ /\.mp3$/i){ chomp $FILE; if ($FILEPATH =~ /\/$/){ $FILE = $FILEPATH.'/'.$FILE; } else{ $FILE = $FILEPATH.$FILE; } push(@FILELIST, $FILE); } } } print("Warning: Folder seems to be empty!\n") unless(@FILELIST +); return \@FILELIST; #Now return array as reference } else{ print("Error: You must specify at least one input directory!\n +"); return 1; } }
sorry for this huge cutout, but I think it's necessary to understand what I want to do and especially the error may be . Okay, actually the "if(-d ...)" doesn't work well. It only recognizes "." and ".." as directory thus effectively nothing because they will be ignored. But when I put out every filename, the direcotries are listed anyway. So why does this pretendingly easy -d not work? I am seeking this source of all evil for a long time... would be great if someone can give me a solution. Thanks. Marcel

Replies are listed 'Best First'.
Re: Simple Recursion
by Corion (Patriarch) on Nov 22, 2007 at 10:15 UTC

    Without looking further at your code, I think you will likely need to prepend the current path when checking for the file/directory type and when you call your subroutine recursively:

    use File::Spec; ... foreach my $FILE (@LIST) my $FULLFILE = File::Spec->catfile($FILEPATH, $FILE); if(-d $FULLFILE){ #Don't wanna have current directory and the one be +low next if $FILE =~ /^./; #print("Current Directory: $FULLFILE\n"); push(@FILELIST,@{&readin_directory($FULLFILE)}); } ...

    Is there a reason you're not using File::Find, which implements mostly what you've done?

Re: Simple Recursion
by cdarke (Prior) on Nov 22, 2007 at 12:06 UTC
    Bitten by regular expression syntax.
    next if $FILE =~ /^./;

    Should be:
    next if $FILE =~ /^\./;

    In your match you were looking for each filename starting with, not a 'dot', but any single character. Naturally everything matched. Easily done.

    Update:
    Turns out you have a few other issues. As Corian says, you need to prefix $FILE with the directory name. There is another issue as well - for some reason you require a reference to an array to be passed to your subroutine (why?). Here are the relevant lines modified:
    if(-d "$FILEPATH/$FILE"){ #Don't wanna have current directory and the one below next if $FILE =~ /^\./; push(@FILELIST,@{readin_directory(["$FILEPATH/$FILE"])} +); }

    I have not tested this on a wide range of naming conventions. Notice that the argument to the subroutine call (& not required) is in square brackets, to generate a ref to an array.

Re: Simple Recursion
by j1n3l0 (Friar) on Nov 22, 2007 at 11:52 UTC
    You may also want to look at File-Util.

    I haven't used it in anger but it seems to apply to your problem.

    Hope that helps =)


    Smoothie, smoothie, hundre prosent naturlig!
Re: Simple Recursion
by graff (Chancellor) on Nov 26, 2007 at 04:15 UTC
    Corion and cdarke have corrected some important problems with your code, and their advice will probably lead you to find related problems.

    Here's a cleaned up version that should work; it provides a few extras and some changes that you might find useful, in addition to possibly speeding things up, making it more flexible, etc.

    use strict; use Carp; sub readin_dir { my ( $filenames, $paths, $ext ) = @_; if ( ref($filenames) ne 'ARRAY' or ref($paths) ne 'ARRAY' or @$pat +hs == 0 ) { carp( "readin_dir call lacks array_ref(s) for file names and/o +r paths\n" ); return; } for my $path ( @$paths ) { $path =~ s{/+$}{}; # don't need or want trailing slash(es) opendir( my $dh, $path ) or do { warn "readin_dir: open failed for $path\n"; next } +; my @subdirs = (); while ( my $file = readdir( $dh )) { next if ( $file =~ /^\.{1,2}$/ ); if ( -d "$path/$file" ) { push @subdirs, "$path/$file"; } elsif ( $ext eq '' or $file =~ /\.$ext$/ ) { push @$filenames, "$path/$file"; } } closedir( $dh ); readin_dir( $filenames, \@subdirs, $ext ) if ( @subdirs ); } }
    Some points worth noting:
    • Handle calling errors at the beginning, and use Carp for that. (You might want the "croak" function rather than the "carp" function.)
    • The caller should provide a reference to an array for accumulating the list of targeted file paths, so that readin_dir only has to push onto this array (it doesn't create the array).
    • The caller should (optionally) provide a file extension to search for, so that readin_dir can apply to more situations. (It would be easy to make it even more flexible.)
    • When you run into a directory that you cannot open, skip it and move on (rather than dying).
    • When opening a directory (or file) inside a recursive function, it's a good idea to use a lexically scoped scalar for the handle, rather than a bareword typeglob like "DIR" or "FILE". (Not really an issue here, since the handle is closed before the recursive call, but it's a good habit to develop.)
    • You don't have to use chomp on strings returned by readdir
    • Don't create extra arrays when you don't have to.
    The traditional PerlMonks answer to this kind of directory-search problem is to use File::Find or one of its several variants, and in your case, it might be worthwhile to look into that. The API's for those modules tend to be a bit strange, and you may just end up staying with the recursive function, but you might appreciate the extra power and flexibility that the modules can provide, e.g. for dealing with symbolic links on *nix or macosx -- then again, if you're on one of those systems, using the regular unix "find" utility can be even easier, and will most likely be faster.

    Here's a little benchmark that compares the recursive function against both File::Find and a unix "find" command being opened as a file handle. The recursive function came out slowest for me, taking about twice as long as unix "find"; File::Find ended up surprisingly close to (not so much slower than) unix "find" in my case (perl 5.8.8 on macosx).

      sorry, for the time waiting with this reply; I want to say thanks for all your tips and help. btw. I actually know about File::Find. But to my shame I must admit not to get it running so I tended to use the classical way by doing it myself.