thanks everyone for the pointed questions. I had tried to keep my question reasonably general, because I thought it would be better for others who have similar questions.
the specific application is a data base of published articles. think of
unique-key|Time Magazine|Why the monks are great|Sep 13, 2006|p245-133|volume 8|number 10
the data base, in plain text and this form, is about 5GB now (but could grow to 20GB in the future), and so the ASCII version fits into RAM. usually the DB changes, say, once per month. I could rebuild it every time anew from the store. there is no guarantee on length or uniqueness of anything, except the unique key.
I do need quick access into individual words. so, if I want to find all articles that contain the word 'Time' and the work 'monks' and the number 245, my search should be blindingly fast to find all unique-keys that contain the three words, and then display these records. assume access is very frequent, too---say, I wanted to do research that does 'permutation of words' research, so each article launches a search over the data base.
the lazy implementation would be to take every word, and put each word as key into a hash with the value being the arrays of unique keys where the word occurs; and a second hash which gives me the record given a unique key. of course, with perl hashes, this would take too much space. from my limited experience with SQL, after I rearrange the data, it would also blow up a lot.
on the plus side, this is all "read-only".
sql would be ok, but it just feels like it is not the right tool for the job. sql dbs seem made more for updating than for blindingly fast read access.
I was also only guessing that SSD would be a good tool for the job.
help?
|