1. what should work faster for access user dir with id 748332: /74/83/32/748332/ or /7/4/8/3/3/2/748332

    The reasoning behind placing files in directory structures formed by partitioning the name-space is to avoid huge numbers of files in a single directory which (on some file systems) have to be search linearly.

    Eg. If your 6-digit ID defines your ID space, then placing 1 million files in a single directory means on average, you have to inspect 500,000 entries to find the file you are looking for.

    But, if you split that into /xx/yy/zz.dat, then on average you will inspect 50 entries in the first level, 50 in the second and 50 in the final level. !50 inspections .v. 500,000 is a good trade.

    Using (a modified version of) your second schema /p/q/r/x/y/z.dat, it will (on average) be 5 in each of the 6 levels giving 30 inspections.

    The latter sounds like a good idea, but in practice the benefits can be outweighed by the complexities. This depends upon the actual file-system in use, and you will need to test to see what works best on your particular file-system.

  2. Files in linux directory are indexed

    Again, this depends upon the file-system in use. AFAIK, ext2/ext3 are not indexed (or hashed), but other *nix file-sytems may be.

  3. For example someone posted a message, what better : to save all the replies for this message in a singe file or save each reply in separate file in the folder that will be created for this message and when someone view the message to gather all the replies from the files

    Reading between the lines, I'm guessing your thinking of implementing a message-board type system (not unlike PM).

    If so, the "better" will depend upon many factors:

    • Will replies have their own IDs within the 6-digit ID space?
    • Will replies only be displayed subservient to their parent? Or will they be viewable individually?
    • Are replies to replies possible?

A comment: Your proposed schemas /74/83/32/748332/ & /7/4/8/3/3/2/748332/ both incorporate two levels of redundancy. There is no benefit in this.

Two questions:

If there a tutorial or book about flat files database it will be great !

The only paper I ever saw on the subject was an IBM RedBook, but that was 15 or 20 years ago, so my memory of it is vague. You could try searching that site, but I don't have any good keywords to offer you right now. Maybe some will come back to me.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Design flat files database by BrowserUk
in thread Design flat files database by AlfaProject

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.