Tuna has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory structure, for example:
/nfs/export/netflow/intermediate/ /nfs/export/netflow/intermediate/200101-200102 /nfs/export/netflow/intermediate/200103-200104 /nfs/export/netflow/intermediate/questionable-200101-200102 /nfs/export/netflow/intermediate/questionable-200103-200104 /nfs/export/netflow/intermediate/current@ /nfs/export/netflow/intermediate/previous@
I want the code to look for any directory(s) beginning with "questionable", and any directory(s) matching the date-range convention of yyyymm-yyyymm. I am using File::Find to recurse through the top-level directory. Here's my problem: I need to process files contained the directories
/nfs/export/netflow/intermediate/200101-200102 /nfs/export/netflow/intermediate/200103-200104
differently than the files contained in:
/nfs/export/netflow/intermediate/questionable-200101-200102 /nfs/export/netflow/intermediate/questionable-200103-200104
HOWEVER, the files contained in both the "questionable" directories and the "yyyymm-yyyymm" directories are named using the same convention.

File names in either directory will be, for example:
bgp-nexthop.chicago4-cr8.20010307.gz or unknown-prefix.chicago4-cr8.20010307.gz

I know that File::Find will ignore any symlinks by default, which is what I want. But, I'm not sure how to pass separate functions to files, based upon what directory they reside in.
Here's some code:
#!/usr/local/bin/perl -w use strict; use File::Find; my $intdir = "/nfs/export/netflow/intermediate"; find(\&get_good_int, $intdir); sub get_good_int { chomp; next if ( $_ =~ /^unknown/); my @list = split; foreach my $file (@list) { print "My filename is: $file\n"; } }
This prints the contents of "intermediate", without recursion. Obviously, the subroutine, &get_good_int doesn't do any thing useful yet, but I want to make sure that I'm grabbing the correct files, first. From reading other posts re: File::Find, I think that it is wise to mention that I am on a machine running 5.003_26.

Replies are listed 'Best First'.
Re: File::Find Question
by repson (Chaplain) on Mar 14, 2001 at 10:58 UTC
    What you want to start with is perlman:-X, which describes a series of functions for testing file/directory types. These can be used with $_ which will be set to the current filename, and the directory will be chdired to the appropriate place to use it. $File::Find::name (the full path) and $_ can then be used with common string operations to find if the file/directory name suits your conditions.

    Here is a possible starting point I put together:

    find( sub { if (-d $_) { if (/^questionable/) { dir_func_1($File::Find::name); } elsif (/-(\d{6}-\d{6})$/) { dir_func_2($_,$1); } # Uncomment following line to prevent furthur recursion. # $File::Find::prune = 1; } else { # you may want furthur tests before assuming good file file_func($_); } }, $intdir);

    Updated: Changed chmod to chdir, stupid me

Re: File::Find Question
by buckaduck (Chaplain) on Mar 14, 2001 at 20:02 UTC
    There's a few bits of your sample code that don't seem right:
    • There's no need to chomp the filename. (harmless though)
    • It's odd that you should specify $_ in the pattern match, since you appear to understand how to use it as the default for the split command later. (also harmless)
    • Why are you splitting the filename on whitespace? The variable $_ contains only one filename, because the subroutine is being called once for each file. It looks like you think that $_ contains a list of filenames...
    • Even so, I don't see why your code wouldn't recurse. It just doesn't do what you want.
    Your test sub could be written more simply:
    sub get_good_int { next if /^unknown/; print "My filename is: $_\n"; }
    Something closer to being useful might be this:
    use File::Find; use File::Basename; find (\get_good_int, '/nfs/export/netflow/intermediate'); sub get_good_int { next if /^unknown/; # If the directory path includes this type of dir if ($File::Find::dir =~ m#/questionable-\d{6}-\d{6}#) { myfunc1($_); } # If it contains this other type of dir elsif ($File::Find::dir =~ m#/\d{6}-\d{6}#) { myfunc2($_); } }
    buckaduck