in reply to Re^2: fast disk db with bulk insert, fast read access, compact storage
in thread fast disk db with bulk insert, fast read access, compact storage

Your original description of your application:

simple---key, value. ... 32GB ... (think as application of a word hash that I am rebuilding every night, and I want to do real-time search as my users are typing words.)

Is almost completely at odds with this description:

is about 5GB now (but could grow to 20GB in the future) ... the DB changes, say, once per month ... I do need quick access into individual words. so, if I want to find all articles that contain the word 'Time' and the work 'monks' and the number 245, my search should be blindingly fast to find all unique-keys that contain the three words, and then display these records.

The former implies indexing by the characters of the unique key only.

The latter requires a fully inverted index of the words in the entire records, which essentially makes the unique key redundant.

You need to define the actual use your data will be put to, before looking for the mechanism for doing it.

  • Comment on Re^3: fast disk db with bulk insert, fast read access, compact storage

Replies are listed 'Best First'.
Re^4: fast disk db with bulk insert, fast read access, compact storage
by iaw4 (Monk) on Sep 18, 2010 at 22:27 UTC

    actually, I think the db descriptions are pretty much the same (5GB for testing now, 20GB in the future, so 32GB is a good upper limit), although neither description was very good. however, what is very good is that you told me what it is that I am really looking for: an inverted index. thanks a lot. very helpful. I should be able to search for this now in a much more intelligent fashion.

    so I need a nice, fast inverted index program for ubuntu perl. knowing what I need, I could now search cpan. as luck would have it, Search::Moose seems to be designed for this sort of job. (as bad luck would have it, it aborts during the build stage on my ubuntu machine. if someone knows more about Search::Moose, please let me know.)

    thanks a lot, everybody.

    regards, /iaw

      I think the db descriptions are pretty much the same

      I guess we read different things.

      as luck would have it, Search::Moose seems to be designed for this sort of job.

      Hm. cpan didn't show up anything when I searched for Search::Moose.

      And, if you're looking for "blindingly fast", anything with "Moose" in the title probably isn't going to cut it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        :-)

        I meant Search::Mousse . I guess I was thinking too Canadian...

        These have not been my best days (posts).