neilwatson has asked for the wisdom of the Perl Monks concerning the following question:

Using file::find, how can I make it skip whole directories? For example, when recursing through /home I'd like to have the process skip any directory /home/<dir> whose user id is less than 500.

find(\&offenders, $dir); sub offenders{ # do not include links return if (-l); # owner of file but skip # if owner is not a user # (uid < 500) $uid = (lstat($_))[4]; #return if ($uid < 500); if ($uid < 500){ print "UID is $uid, skipping $File::Find::dir $_\n"; return; } # scan only regular files if (-f){ $uname = getpwuid $uid; # gather name of file $fname = $File::Find::name; # size of file (kb) $size = (lstat($_))[7]; $size = int($size/1000); # keep running total of each # user's space use $size{$uname} += $size; } }

This code will skip files but not whole directories. What I have done wrong?

Neil Watson
watson-wilson.ca

Replies are listed 'Best First'.
Re: File::find and skipping directories
by Paladin (Vicar) on Jun 01, 2004 at 20:24 UTC
    Add the line:
    $File::Find::prune = 1;
    in your first if block right before the return. This tells File::Find not to recurse into that dir. Check the perldoc for more info.

    You might also want to check if it is indeed a dir in that same if block, instead of just checking the UID.

Re: File::find and skipping directories
by cosimo (Hermit) on Jun 01, 2004 at 20:25 UTC
    Try with:
    if ($uid < 500){ print "UID is $uid, skipping $File::Find::dir $_\n"; $File::Find::prune = 1; return; }
    Setting $File::Find::prune global to a true value, you are telling find() function to avoid recursing into the current dir. You can find a (brief) mention of this feature on File::Find documentation
Re: File::find and skipping directories
by sacked (Hermit) on Jun 01, 2004 at 20:28 UTC
    You need to set $File::Find::prune:
    if ($uid < 500){ print "UID is $uid, skipping $File::Find::dir $_\n"; $File::Find::prune= 1; return; }
    This is akin to the -prune flag of find(1). The pod for File::Find only gives one example using $File::Find::prune, unfortunately.

    --sacked
Re: File::find and skipping directories
by graff (Chancellor) on Jun 02, 2004 at 03:00 UTC
    Maybe the first step, limiting the search to only those directories in /home with uid >= 500, should just be done with readdir. Once you have the list of home directories to be searched, use File::Find on each of those in turn.

    Or (to beat a favorite dead hobby horse of mine) use other tools to do something simple, like "du -k -s $dir" -- I'll bet this turns out to run faster and use less memory than File::Find.

    I tried out the following, and I think it basically does what you're looking for. I didn't benchmark it against using File::Find to get the equivalent result for this case, but in other cases where I have done the benchmarking, File::Find consistently takes at least a few times longer than a solution that doesn't use it.

    #!/usr/bin/perl use strict; # round up the usual suspects... chdir "/home"; opendir( H, "." ) or die $!; my @homers = grep { ( -d and (stat(_))[4] >= 500 ) } readdir H; closedir H; # track down their disk usage open( SH, "| /bin/sh > /tmp/home.scan.$$" ) or die $!; print SH "du -k -s $_\n" for ( @homers ); close SH; # read and print the results from worst to nicest open( U, "/tmp/home.scan.$$" ) or die $!; my %usage = map { (/(\d+)\s+(\S+)/); $2=>$1 } <U>; close U; print "$_ : $usage{$_}\n" for ( sort { $usage{$b} <=> $usage{$a} } key +s %usage ); # all done unlink "/tmp/home.scan.$$"; exit(0);
    (update: in case it's not clear, note that "du" does a recursive tally of space consumed by a given directory tree; by default, it lists all subdirectories and the total data contained within each -- the "-s" option turns off the detail and gives just a bottom-line total for the top-level path. Also, "du" does not follow symbolic links, whether these point to data files or other directories; I gather that was the intention of the OP, but it's not clear to me whether the OP's use of find would follow symlinks that point to directories.)
Re: File::find and skipping directories
by neilwatson (Priest) on Jun 02, 2004 at 13:41 UTC
    Thank you for the suggestions. Alas, it seems I am still missing an important piece of knowledge. I changed my code to this:

    find(\&offenders, $dir); sub offenders{ # recurse directories $File::Find::prune = 0; # there are some shared directories # that we do not want to include my @dirs = '\/home\/engineering\/dev\/share'; foreach my $z (@dirs){ #print "$z eq $File::Find::dir\n"; return if ($File::Find::dir =~ m/$z/) } # do not include links return if (-l); # owner of file but skip # if owner is not a user # (uid < 500) # #return if ($uid < 500); $uid = (lstat($_))[4]; if (-d && $uid < 500){ # check do not recurse these # dirctories $File::Find::prune = 1; print "UID is $uid, skipping $File::Find::dir, $File::Find::na +me, $_\n"; return; } # scan only regular files if (-f){ $uname = getpwuid $uid; # gather name of file $fname = $File::Find::name; # size of file (kb) $size = (lstat($_))[7]; $size = int($size/1000); # keep running total of each # user's space use $size{$uname} += $size; } }

    This skips everything. Here is the output:

    UID is 0, skipping /home, /home, .
    Then the program exits.

    Neil Watson
    watson-wilson.ca

      I suggest explicitly checking for a directory named '/home' to allow it:
      if (-d && $uid < 500 and $File::Find::dir ne '/home'){

      Aside, your status message is redundant:
      print "UID is $uid, skipping $File::Find::dir, $File::Find::name, +$_\n"; __END__ UID is 0, skipping /home/zackse, /home/zackse/.kde, .kde
      You can simplify this to:
      print "UID is $uid, skipping $File::Find::name\n"; __END__ UID is 0, skipping /home/zackse/.kde

      Remember that inside the wanted subroutine (offenders in this case), $_ contains the basename,$File::Find::name contains the full path, and $File::Find::dir contains the current directory.

      --sacked