in reply to brainteaser: splitting up a namespace evenly

The perfect solution is not going to be easy to produce, and as you add to your data set, it is going to change.

Instead just find a simple, "good enough" solution. Splitting every 2 digits sounds like a good one to me. If you want to complicate your code, you could make the decision about whether to split at the next 2 digits to be one that depends on how full your directory is. For instance your rule could be, "When the directory goes beyond 100 files, find the starting 2 digit code with the most entries, create that directory, and move those files into it."

An alternate solution is (if you are using Linux) to switch to using a filesystem (eg ReiserFS) which is designed to efficiently handle directories with lots of small files. That is, instead of coding up a workaround for filesystem issues, use a filesystem without the problem in the first place.

  • Comment on Re (tilly) 1: brainteaser: splitting up a namespace evenly