in reply to Re^2: Splitting up a filesystem into 'bite sized' chunks
in thread Splitting up a filesystem into 'bite sized' chunks
I'm working on something that uses File::Find to send file lists to another thread (or two) that's using Thread::Queue. My major requirement is breaking down a 10Tb, 70million file monster filesystem
Given the size of your dataset, using an in-memory queue is a fatally flawed plan from both memory consumption and persistance/re-startability point of views.
I'd strongly advocate putting your file-paths into a DB of some kind and have your scanning processes remove them (or mark them done) as the processes them.
That way, if any one element of the cluster fails, it can be restarted and pick up from where it left off.
It also lends itself to doing incremental scans in subsequent passes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Splitting up a filesystem into 'bite sized' chunks
by Preceptor (Deacon) on Jul 10, 2013 at 19:35 UTC | |
by BrowserUk (Patriarch) on Jul 10, 2013 at 20:58 UTC | |
by Preceptor (Deacon) on Jul 10, 2013 at 21:16 UTC | |
by BrowserUk (Patriarch) on Jul 10, 2013 at 21:46 UTC | |
by Preceptor (Deacon) on Jul 10, 2013 at 22:59 UTC |