Re: Get folders in a directory with 20k directories?

I agree that something does not “smell right” about this. Is there a way to monitor the network traffic that is passing between the machines, and/or to observe the behavior of the processes on other machines that are responsible for producing and/or for delivering the list?

If such a directory structure was known to perform that egregiously, no one in their right mind would have designed and built such a thing. Ergo, it didn’t. Ergo, something else must be wrong (too) ... something unrelated to the file/directory counts.

Replies are listed 'Best First'.
Re^2: Get folders in a directory with 20k directories? by MidLifeXis (Monsignor) on Aug 25, 2011 at 13:19 UTC
Not entirely true. If such a directory structure was known to perform that egregiously, no one in their right mind would have designed and built such a thing. Ergo, it didn’t. Ergo, something else must be wrong (too) ... something unrelated to the file/directory counts. The classic Unix file system would store the directory entries in a chained list of blocks. If you were doing a long listing (`ls -l`, for example), you would also need to access the main inode for each file to pull file permissions, ownership, etc. This would cause degraded performance if you had a large, flat directory structure. Also, `ls` tends to sort its output. A common solution to this problem is to make your directory more tree-like by hashing the file list into a certain number of buckets (N), where N is no larger than the number of directory entries that can fit into a single block in the directory entry on disk. Create as many levels of directories as you need. For example, if you have X files (20K in this case), and N directory entries (10 for this example) can fit in a single block (it is actually larger than this, but let's keep the math easy), you would need a structure of depth `ceil(log(X) / log(N))` (in this case, 5). Qmail used this format for its queues. --MidLifeXis	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Get folders in a directory with 20k directories?
by MidLifeXis (Monsignor) on Aug 25, 2011 at 13:19 UTC

Not entirely true.

If such a directory structure was known to perform that egregiously, no one in their right mind would have designed and built such a thing. Ergo, it didn’t. Ergo, something else must be wrong (too) ... something unrelated to the file/directory counts.

The classic Unix file system would store the directory entries in a chained list of blocks. If you were doing a long listing (ls -l, for example), you would also need to access the main inode for each file to pull file permissions, ownership, etc. This would cause degraded performance if you had a large, flat directory structure. Also, ls tends to sort its output.

A common solution to this problem is to make your directory more tree-like by hashing the file list into a certain number of buckets (N), where N is no larger than the number of directory entries that can fit into a single block in the directory entry on disk. Create as many levels of directories as you need. For example, if you have X files (20K in this case), and N directory entries (10 for this example) can fit in a single block (it is actually larger than this, but let's keep the math easy), you would need a structure of depth ceil(log(X) / log(N)) (in this case, 5). Qmail used this format for its queues.

--MidLifeXis

[reply]
[d/l]
[select]