Re: brainteaser: splitting up a namespace evenly

You can either do this through sampling the population randomly or using the entire set as a base. Either way a straightforward way to do it would be to (let's assume the whole set):

Determine the optimum number of files in a directory (make up a number) or directories in a directory. Let's say...50. (for an example)
Load up an array with all of your data. Let's say there's 60000 ISBN's.
Determine the integer root of 60000 that closely yeilds 50. square root is 244, cube is roughly 38. Make your direcory depth 3 (cube root).
Sort your list (or your samples)
Split your list into 38 sub-list ranges. The first element in each of these 38 sections represents the uppermost bound for this section, the last the lowermost. This is your first level directory.
Split each of those into 38 sub-list ranges again. The first and last of each of these sublists represent the range of acceptable files for the second level directory. (These last two steps are nicely recursive...)
If you used a sample, you're gonna need a big enough sample to come close to 38^2.
Distribute the stuff in between accordingly into the sub-sub directories. You should now have 38 directories, with 38 subdirectories each with about 38 files.

Comment on Re: brainteaser: splitting up a namespace evenly