ChrisNutting has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a little subroutine that returns "1" if a given directory has any subdirs, and "0" if not: which all works fine, but if I open test lots of dirs, it takes ages grinding away at my hard disk... Can anyone think of a better way that this:
sub hasSubs { my $dir = shift; my $hasSubs=0; if (!($dir eq "\\")) { $dir="$dir\\"; } opendir (CURRENT, $dir) || die("Cannot open the dir: $dir"); foreach my $file (readdir(CURRENT)) { next if (($file eq ".") || ($file eq "..")); if (-d $dir.$file) { $hasSubs = 1; last; } } closedir CURRENT; return $hasSubs; }
I also tried this:
my @subDirList=`dir /ad /b \"$dir\"`; if ($#subDirList >= 0) { $hasSubs=1; } for (my $counter = 0; $counter <= $#subDirList; $counter++) { chomp $subDirList[$counter]; $hasSubs = 1; }
Which worked better on some dirs, but slower on others :(
                    ("`-''-/").___..--''"`-._
                     `6_ 6  )   `-.  (     ).`-.__.`)
                     (_Y_.)'  ._   )  `._ `. ``-..-'
                   _..`--'_..-_/  /--'_..' ,'
                 (il),-''  (li),'  ((!.-'

Replies are listed 'Best First'.
Re: Testing for existance of subdirectories
by tadman (Prior) on Aug 19, 2001 at 12:31 UTC
    You're actually reading and testing every entry in your directory, which probably accounts for your "disk activity". Directories with more entries will obviously take longer than others, especially heavilly loaded ones. There is no built in "filter" for readdir, to the best of my knowledge, so finding directories with subdirectories is going to be involved, even in the best circumstances.

    Your second solution just reads in everything and then backtracks to find subdirs. If this is faster, then a native implementation of the "dir /ad /b" command might help you out.

    If you query in an overlapping way, such that you are actually testing the same directory twice, caching your findings in a hash may save you the trouble of the test.
      Hurrah!!! have found a solution: (Using a combination of both my attempts) This works with almost no file access at all:
      foreach my $file (readdir(CURRENT)) { next if (($file eq ".") || ($file eq "..")); my $checkDir="$dir\\$file"; Win32::File::GetAttributes($checkDir, my $attrib); if ($attrib == 16) { return 1; } }
      Not quite sure what I'm doing with that $attrib, but hey it works!! Thanks for your help guys (Needed to think in a diferent direction :)
                          ("`-''-/").___..--''"`-._
                           `6_ 6  )   `-.  (     ).`-.__.`)
                           (_Y_.)'  ._   )  `._ `. ``-..-'
                         _..`--'_..-_/  /--'_..' ,'
                       (il),-''  (li),'  ((!.-'
      

        I would change that test to:

        if ($attrib & 16) { return 1 }

        What you're doing is testing bit 4 of the file attributes, which is probably the "directory" bit. But if this was a read-only, hidden, etc directory, that $attrib would be something else, like 17.

        If that doesn't make sense, here's a hint; ask if this isn't too clear:

        You want to test this bit v 16 = 00010000b 17 = 00010001b ^
Re: Testing for existance of subdirectories
by mugwumpjism (Hermit) on Aug 19, 2001 at 12:27 UTC

    How long is a "long" time?

    Under all UNIX filesystems that I know of, the special files which hold the directory information don't hold any information on the file (just its name and "inode" number), so you'd expect a call to a function like that to take a few disk io's most of the time.

    Under DOS' FAT, the special files which hold the directory information also hold its length, file type, attributes, etc, and are returned by whatever they call FindFirst/FindNext these days. Maybe you still just "mov ah, 4Eh; int 21h".

    So, strictly speaking, the call to (-d $dir.$file) shouldn't need to access the disk, because Perl has just got all of the information from the "FindFirst/FindNext" system calls. My guess is that because Perl was written under Unix, it doesn't expect the information then, and if the win32 port isn't smart enough to cache that information in case of a "stat", then the call might be emulated. This will probably involve calling "FindFirst" again, which would scan the directory again looking for that file, which means you'd get a disk io (or perhaps just a cache access) for every non-directory that comes before the first subdirectory in a directory.

    About the only way I could see around that problem, if my guess is right, would be to call FindFirst and/or FindNext API call directly and process its output yourself. This might not be as daunting as it sounds, though I'll leave it for someone else to help you with that!

    Update: I've just had a thought. Try replacing your loop with:

    my @files = readdir (CURRENT); closedir CURRENT; foreach (@files) { return 1 if (!/^\.\.?$/ and -d); } return 0;

    There's a chance that might help.

    Also, under UNIX you can test for whether or not a directory has subdirectories with if ((stat $dir)[3] > 2), so it's not that inefficient :-).