in reply to Splitting up a filesystem into 'bite sized' chunks

I'm thinking in terms of using 'File::Find' to build up a list, but this seems ... well, a bit inefficient to me - traversing a whole directory structure, in order to feed a virus scanner a list that'll... then traverse the file structure again. Can anyone offer better suggestions for how to 'divide up' an NFS filesystems, without doing a full directory tree traversal?

Doesn't your virus scanner have a 'scan this file only' option?

Beyond that, I'd look to giving the scanner one drive at a time, rather than (bits of) one file system. Drives are a small number of finite sizes so would make the capacity planning for your distributed system fairly simple.

A realise that *nix file systems are logically single entities, but it is surely possible to mount individual drives/raid units such that it appears as a single subdirectory within the file system.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Splitting up a filesystem into 'bite sized' chunks

Replies are listed 'Best First'.
Re^2: Splitting up a filesystem into 'bite sized' chunks
by Preceptor (Deacon) on Jul 09, 2013 at 21:32 UTC

    Unfortunately, I'm pulling NFS mounts off a NAS. So I can't easily subdivide my volumes. I've got a few places where they're separated into (known size) subdirectories, and that's fine. But almost by the nature of it, the most unwieldy are also the ones that have silly numbers of TB and filecounts within a single structure. I can subdivide the mountpoints, but I'd rather not do it by hand.

    It's a good point though - my scanner probably does have 'per file' scanning, which would mean I could stream a file list from a single source to multiple scanning engines. So perhaps that's the way to go.

    In the grand scheme of things though, the biggest problem isn't so much parallelising the scans on a single filesystem, as that'll create contention, but to have a good notion of a process that can be resumed part way - it's not such a big deal that it's done within a defined time window, but more that I can track progress and ensure everything _does_ get scanned eventually.