Ritter has asked for the wisdom of the Perl Monks concerning the following question:

I've read the threads about finding directory sizes using File::Find and so on adding up all files sizes recursively... Counting the total dirsizes when the dirs grows larger (500GB) becomes a very time demanding process. Time it won't take if I just enter Windows Explorer to check the dir sizes manually. Therefor, shouldn't it be likely to believe that the size of each dir is saved somewhere, to just pickup without needing to recalculate the size? Someone mentioned that using a Win32 API call could be a way, unfortunately I have no idea of how to do that... Someone of you do? Personally I'm using WinXP OS.

thanks,
Ritter

Replies are listed 'Best First'.
Re: Dirsize using Win32 API call?
by BrowserUk (Patriarch) on Dec 07, 2002 at 18:12 UTC

    If your need is to total the size of an entire drive, then using

    my @info = Win32::DriveInfo::DriveSpace( 'C' ); printf 'Total:%11d Used:%11d Free: %11d', $info[5], $info[5]-$info[6], $info[6]; Total: 1076027392 Used: 1040338944 Free: 35688448

    will return the same numbers as displayed on the properties tab for the specified drive in W.Explorer. This appears to be almost instantaneous.

    However, if you need to find the size of a subdir tree, then the quickest way is to use Win32::OLE to access the Scripting.FileSystemObject (assuming you have this installed).

    #! perl -sw use strict; use Win32::OLE; my $fs = Win32::OLE->CreateObject('Scripting.FileSystemObject'); my $folder = $fs->GetFolder('e:/perl'); print 'e:\perl: ', $folder->size(), ' used', $/; e:\perl: 52488590 used

    This still needs to recurse the subdirs, but as it is done from within the OS in C, unsurprisingly it is considerably faster than you can do it yourself from Perl.

    One further advantage of using the FileSystemObject is that subsequent calls to the size() function will reflect any changes to the folder or subtree without needing to re-calculate the numbers from scratch.

    You could also look at the Win32::ChangeNotify API for a method of quickly discovering if any changes have occured in that part of the filesystem you are interested in. The changes that can be monotored include, file/directory names, sizes, attributes, security descriptors, and timestamps.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: Dirsize using Win32 API call?
by jsprat (Curate) on Dec 07, 2002 at 18:15 UTC
    Pure API calls would probably be 20-30 lines long, looping over all files in all subdirectories with the FindNextFile api call. If you can be sure that the Windows Scripting is installed (it is on Win98+ & Win2K+, and you can install Windows Script Host on 95 or NT), try using something like this:

    #!perl use warnings; use strict; use Win32::OLE; my $fs = Win32::OLE->CreateObject('Scripting.FileSystemObject'); my $dir = 'c:/s'; my $dir_obj = $fs->GetFolder($dir); print $dir_obj->Size(), " bytes used\n"; __END__ Output: 38521553 bytes used

    HTH
      Great thanks, you are gold! :)

      Ritter
Re: Dirsize using Win32 API call?
by pfaut (Priest) on Dec 07, 2002 at 15:59 UTC

    Now, think about what you're asking Windows to do. Every time you change the allocation for a file, you want it to update that file's header, plus something in the directory the file resides in. Not only that, but you want this to work recursively so you need to update something in each directory on up to the root directory of the drive. That's an awful lot to do and a lot of disk I/O just to add a couple of blocks to a file.

    When you ask for properties of a drive and it shows you the usage, that's for the drive as a whole. Windows does track that information. But if you were to click on 'Program Files', for example, you should notice that Windows goes and does exactly what you're trying to do with File::Find. The usage numbers don't come up immediately. You'll hear a lot of disk I/O going on and the numbers will gradually increment until it has finished scanning the heirarchy.

      one point to differ on, Windows 2k xp etc cache this info so a second look is fast, you can do the same by using something like Cache::FileCache...

      -Waswas