thanks everyone for the pointed questions. I had tried to keep my question reasonably general, because I thought it would be better for others who have similar questions.

the specific application is a data base of published articles. think of

  unique-key|Time Magazine|Why the monks are great|Sep 13, 2006|p245-133|volume 8|number 10

the data base, in plain text and this form, is about 5GB now (but could grow to 20GB in the future), and so the ASCII version fits into RAM. usually the DB changes, say, once per month. I could rebuild it every time anew from the store. there is no guarantee on length or uniqueness of anything, except the unique key.

I do need quick access into individual words. so, if I want to find all articles that contain the word 'Time' and the work 'monks' and the number 245, my search should be blindingly fast to find all unique-keys that contain the three words, and then display these records. assume access is very frequent, too---say, I wanted to do research that does 'permutation of words' research, so each article launches a search over the data base.

the lazy implementation would be to take every word, and put each word as key into a hash with the value being the arrays of unique keys where the word occurs; and a second hash which gives me the record given a unique key. of course, with perl hashes, this would take too much space. from my limited experience with SQL, after I rearrange the data, it would also blow up a lot.

on the plus side, this is all "read-only".

sql would be ok, but it just feels like it is not the right tool for the job. sql dbs seem made more for updating than for blindingly fast read access.

I was also only guessing that SSD would be a good tool for the job.

help?


In reply to Re^2: fast disk db with bulk insert, fast read access, compact storage by iaw4
in thread fast disk db with bulk insert, fast read access, compact storage by iaw4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.