in reply to Data store and threads

Is 2kk+ a typo or do you mean 2M+ entries? I guess the second, because 2k doesn't sound like "lots".

Generally if you can keep the data in memory it will be a lot faster than anything you can read from disk. But depending on what searches you need to perform (for example searches for single words without wildcards) it might make more sense to store the data to disk and have a suitable search index (for example as a simple hash) in memory instead

Fuzzy searches with wildcards and finding 'mile' when you type in 'miles' need more effort, there are packages that allow this.

Replies are listed 'Best First'.
Re^2: Data store and threads
by Kirche (Sexton) on Feb 11, 2010 at 16:36 UTC
    Yes, 2M. I can't store all data in memory. Data is like:
    412415
    535236642
    32523
    
    I just need to check presence of certain digits in this db.
      Yes, 2M. I can't store all data in memory.

      Why not? A hash containing 2 million keys takes around 65 MB of memory:

      undef $h{ 0+ int rand 2**32 } for 1 .. 2e6;; print scalar keys %h;; 1999519 print total_size \%h;; 67353411
      assuming that i need to write/search through this data rapidly from several threads

      Can you say a little more about what you are doing with this data?

      What would you be writing to it?

      Do changes need to persist beyond the life of the program?

      How many threads?

      How rapidly?

      I just need to check presence of certain digits in this db.

      Do you mean numbers rather than "digits"?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.