annie has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse through files in a given directory, and all its subdirectories. I want, however, to not follow the directory if it starts with an underscore. Basically, the algorithm would be:
check directory name if (name !~ /^_/) read all files and process check subdirectory names etc.
I'm guessing that my best bet would be to use File::Find and the 'untaint_skip' looks promising. But I'm not sure how the syntax should look.

Replies are listed 'Best First'.
Re: Pruning directory searches with File::Find
by broquaint (Abbot) on Jul 25, 2003 at 19:40 UTC
    Being the local File::Find::Rule marketeer I thought I'd offer this alternative solution
    use File::Find::Rule; my @files = find( directory => not_name => qr/^_/, in => @ARGV ## or whatever ); for(@files) { ... }
    See. the File::Find::Rule docs for more info on this all-singing, all-dancing module.
    HTH

    _________
    broquaint

      It would seem to me that that particular module's paradigm for extracting files from the file system could pose some serious memory issues if the rules that you specify result in returning nearly all of the files in the tree you specified, which may certainly be the case if you're specifying just a lenient "not" rule for pruning things out. This is akin to slurping an entire file instead of reading it line by line. Often you can get away with it as the file will be of a reasonable length, but sometimes you'll get burned when you try to blast an enormous file into memory. Slurp a short config file, and nobody will notice; slurp a SQL transaction log that hasn't been rotated recently and you could bring the system to its knees. Caveat Slurpor.

        It would seem to me that that particular module's paradigm for extracting files from the file system could pose some serious memory issues if the rules that you specify result in returning nearly all of the files in the tree you specified
        Er yes, but I wouldn't say this is an issue of the module so much as its grand ability to let you get on with it. Much like SQL will allow you to perform a SELECT *, it doesn't necessarily condemn SQL (the various issues of SQL are for another node I'm sure). With great power comes great responsibility and all that :) Anyhow you could always just use the iterative approach like so
        use File::Find::Rule; my $dir_rule = rule( directory => not_name => qr/^_/, start => @ARGV, ## or whatever ); while(my $dir = $dir_rule->match) { ... }
        Lovely.
        HTH

        _________
        broquaint

Re: Pruning directory searches with File::Find
by bluto (Curate) on Jul 25, 2003 at 20:08 UTC
    In the File::Find wanted subroutine ...
    if (-d _ and /^_/) { $File::Find::prune = 1; return; }
    ... untested of course. See "perldoc File::Find".

    Update: Or you may want to use "-d $_" instead.

    bluto

      actually, you should use:

      use File::Spec 'catfile'; ## later, in sub wanted... if( -d catfile( $File::Find::dir, $_) && m/\A_/) { $File::Find::prune= 1; return; }

      you need to specify the absolute path to the file system object you're accessing. $_ stores the name relative to the current search directory within File::Find. also, File::Spec will give you the platform independence you secretly crave ;P

      but, overall, i'd still suggest broquaint's method. File::Find::Rule makes code like this easier to code, understand, and maintain.

      ~Particle *accelerates*

        I'm not sure why you'd want to do this unless you specified "no_chdir" (see the man page). You are already chdir'd there so $_ is fine. In anycase, you should be able to use $File::Find::name if you are paranoid and skip the catfile().

        bluto

Re: Pruning directory searches with File::Find
by skyknight (Hermit) on Jul 25, 2003 at 19:57 UTC

    I don't think that File::Find is going to let you throw away directory subtrees that you don't like. You could put code into your wanted() method that will ignore files with a path of the form that you describe, but you're going (to the best of my understanding) end up having File::Find waste time by walking through whole subtrees that you'd rather ignore. You might try using the following idiom in the place of File::Find to accomplish what you want...

    use strict; use Cwd; my $cwd = Cwd::getcwd(); my $directory = shift(@ARGV) || $cwd; $directory = $cwd . '/' . $directory unless $directory =~ /^\//; my @queue = ($directory); while (@queue) { my $node = shift(@queue); if (-d $node) { opendir(DIR, $node); push(@queue, map { $node . '/' . $_ } grep { $_ ne '.' and $_ ne '..' and $_ !~ /^_/ } readdir(DIR)); closedir(DIR); } else { do_your_stuff($node); } }

    This will do a depth first search on either the current working directory, or the directory that you specify on the command line, and it will ignore all subtrees of directories beginning with _. I hope this helps, and I hope it doesn't turn out to be a horribly convoluted way of doing it if there is an easier way with File::Find.

    Update: It was come to my attention that exploitation of the $File::Find::prune variable is a much more expeditious way of accomplishing a pruning. I confess! I confess! Now stop minus one-ing me, I admit the error of my ways.

Re: Pruning directory searches with File::Find
by PodMaster (Abbot) on Jul 26, 2003 at 01:38 UTC
    lookup preprocess in File::Find, use it, something like
    find( { preprocess => sub { my @foo = grep { ! /^_whatever/ } @_; @foo; }, wanted => \&wanted, }, 'rhebarb' );

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.