I've had recent experience of a similar nature on freebsd (or maybe it really involves some sort of modified linux on a particular Sun NAS -- not sure). A directory grew to a few million files, and the processes involved were keeping a separate (mysql) database of the file names as they were created.

Data processing of file content was moderated by getting file names from mysql. For reasons I don't understand, access to specific files by name was never much of a problem (well, I wasn't in charge of the processes involved, but the person who was apparently didn't have any serious trouble of this sort).

Of course, any other process that relied on a full scan or wildcard search of the directory (e.g. "ls", "find", system backups, anything using readdir, ...) became horribly bogged down, and upon going into this directory, would run for hours before finishing.

I think part of the problem was that files were constantly being created at a rate of dozens per minute -- a lot would change in the time it took to do a complete scan. But most of the problem was just the size of number of files in the directory. (I learned that the memory footprint of freebsd "find" grew perversely when it had to walk through this thing.)

Interestingly, as we got around to cleaning up the mess (so that a "daily" backup could finish within a day), we found that once we stopped creating new files and reduced the number of standing files to 100,000 or less, the response time for full scans became fairly acceptable, even though the disk space consumed by the directory file itself had not been reduced. (Unix/linux directory files do not shrink when the files they contain are deleted).

I sort of wish I knew more about how the contributing factors interact, but I've concluded from other experiences that reducing the file count is probably the biggest factor.

But the more relevant lesson, I think, is to avoid any processing strategy that would populate a directory like that. Just don't do it.


In reply to Re: unlink : reclaim inodes, etc.? by graff
in thread unlink : reclaim inodes, etc.? by camelcom

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.