in reply to File::find and skipping directories

Maybe the first step, limiting the search to only those directories in /home with uid >= 500, should just be done with readdir. Once you have the list of home directories to be searched, use File::Find on each of those in turn.

Or (to beat a favorite dead hobby horse of mine) use other tools to do something simple, like "du -k -s $dir" -- I'll bet this turns out to run faster and use less memory than File::Find.

I tried out the following, and I think it basically does what you're looking for. I didn't benchmark it against using File::Find to get the equivalent result for this case, but in other cases where I have done the benchmarking, File::Find consistently takes at least a few times longer than a solution that doesn't use it.

#!/usr/bin/perl use strict; # round up the usual suspects... chdir "/home"; opendir( H, "." ) or die $!; my @homers = grep { ( -d and (stat(_))[4] >= 500 ) } readdir H; closedir H; # track down their disk usage open( SH, "| /bin/sh > /tmp/home.scan.$$" ) or die $!; print SH "du -k -s $_\n" for ( @homers ); close SH; # read and print the results from worst to nicest open( U, "/tmp/home.scan.$$" ) or die $!; my %usage = map { (/(\d+)\s+(\S+)/); $2=>$1 } <U>; close U; print "$_ : $usage{$_}\n" for ( sort { $usage{$b} <=> $usage{$a} } key +s %usage ); # all done unlink "/tmp/home.scan.$$"; exit(0);
(update: in case it's not clear, note that "du" does a recursive tally of space consumed by a given directory tree; by default, it lists all subdirectories and the total data contained within each -- the "-s" option turns off the detail and gives just a bottom-line total for the top-level path. Also, "du" does not follow symbolic links, whether these point to data files or other directories; I gather that was the intention of the OP, but it's not clear to me whether the OP's use of find would follow symlinks that point to directories.)