in reply to File::Find in a thread safe fashion

If the reason for the threading is performance, then it seems to me that you are recalculating the statvfs structure, which is more readily available via the module Filesys::Statvfs (or one of its brothers).

The *nix command df -k uses that same system-internal structure, reporting it back one line per file systems as device, blocks, #used, #available, %capacity and mount_point.

On a huge Sun Solaris system with hundreds of file systems, I just got the result back from df -k in only 0.02 seconds and would expect a Perl program using such a Filesys module to perform comparably well.

Update: If your needs are indeed limited to what df does, Filesys::Df will be easier to use or Filesys::DfPortable if the code also has to run on any of Mac OS X, Unix, Linux, Windows 95 or later and so on.

More update: In addition, to get per user per file-system stats, you could also enable disk usage quotas, without actually limiting usage, but to enable retrieval of such information via the Quota module.

-M

Free your mind

  • Comment on Re: File::Find in a thread safe fashion

Replies are listed 'Best First'.
Re^2: File::Find in a thread safe fashion
by Preceptor (Deacon) on Jul 28, 2006 at 12:16 UTC
    Thanks, I'll look into those.

    My needs are _loosely_ what a 'du' does, but a little more complicated - I'm needing (assuming a tree of):

    /usr 10mb /local 5mb /apache 5mb /include 1mb /system 1mb
    followed by a little hackery to assign different 'structures' to cost centres.

    So yes, doing a du of /usr, then of /usr/local, then of /usr/local/apache, would be a solution, but then I'd end up reading the tree lots of times, which'd get very expensive.
      If each such 'structure' could be put in a separate device partition, either directly or probably handier using symbolic links to isolate it from where it normally lives, then Filesys:Df would still do the job more efficiently than having to recalculate the (f)statvfs yourself.

      Update: The way my own hosting supplier does it is to have a separate partition per client, symbolically link the top directory of each website structure as a subdirectory of where the apache server is installed and alias each website to that directory in the httpd.conf. They have a different location for all the webmail though, just because that has a different tariff per MB.

      I imagine there is a downside that they have to automate partition allocation and it is hard for customers to give up disk space they have requested because extending partitions is significantly easier than recycling part of an allocated partition, even more so if this has to be automated.

      -M

      Free your mind