I second the advice of using DBM instead of flat files. Flat files tend to get quite messy, don't scale well, and are extremely difficult to maintain.

However, if you insist on using a roll-your-own flat file system, and you're sure that you only need to key on the first word I do have a suggestion. Instead of using the entire word for a directory (i.e. /www/search/KEYWORD) you might want to take it a step further, and use subdirectories based on the first few letters of each keyword. Your structure would then look something like:

KEYWORD       FILE
dartboard  => /www/search/d/a/rtboard.dat
doghouse   => /www/search/d/o/ghouse.dat
dog        => /www/search/d/o/g.dat
do         => /www/search/d/o.dat   
             (note how the suffix avoids colliding with the 'o' directory)
I'm guessing with a big dataset, you'll run into the limits of the number of entries in a single directory. (I hit that limit once on an old version of linux at 32,000) This way will speed up access (I think) and help you avoid the OS directory limits.

Again, I would use DBM if at all possible.

-Blake


In reply to Re: Poor Person's Database by blakem
in thread Poor Person's Database by Cody Pendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.