Re^7: Splitting up a filesystem into 'bite sized' chunks

and combining it with File::Find::prune to "skip forwards".

I have no experience (nor knowledge even) of that, so I cannot comment on it.

I'm not really sure why I'm resisting databases,

It wouldn't have to be (nor benefit from) being a full RDBMS, but it would need to be able to handle low levels of read contention and a concurrent writer.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^7: Splitting up a filesystem into 'bite sized' chunks

Replies are listed 'Best First'.
Re^8: Splitting up a filesystem into 'bite sized' chunks by Preceptor (Deacon) on Jul 10, 2013 at 22:59 UTC
File::Find let's you specify "prune" which enables or disables traversal. If you know your last checkpoint is /mnt/myhome/stuff/junk you can pattern match your file path, and turn off traversal until you get a match. (you may have to roll up your checkpoint a little if the target has been deleted in the interim). That'll - hopefully - give me a restartable find. Being able to "skip head" in future (and thus distribute processing) may require a first pass, and tracking multiple checkpoints. Thinking about it, some sort of start/finish and some way of compensating for "drift". But one checkpoint every 100k files takes a huge list down to merely large. (but still doesn't help your first pass, unless you can take some wild guesses for initial checkpoints and do that same drift compensation.)	[reply]

Replies are listed 'Best First'.

Re^8: Splitting up a filesystem into 'bite sized' chunks
by Preceptor (Deacon) on Jul 10, 2013 at 22:59 UTC

File::Find let's you specify "prune" which enables or disables traversal. If you know your last checkpoint is /mnt/myhome/stuff/junk you can pattern match your file path, and turn off traversal until you get a match. (you may have to roll up your checkpoint a little if the target has been deleted in the interim).

That'll - hopefully - give me a restartable find. Being able to "skip head" in future (and thus distribute processing) may require a first pass, and tracking multiple checkpoints.

Thinking about it, some sort of start/finish and some way of compensating for "drift". But one checkpoint every 100k files takes a huge list down to merely large. (but still doesn't help your first pass, unless you can take some wild guesses for initial checkpoints and do that same drift compensation.)

[reply]