Kirche has asked for the wisdom of the Perl Monks concerning the following question:

What is the best way to store lots of data (2kk+ entries, plain text delimited with \n), assuming that i need to write/search through this data rapidly from several threads (without additional sql server)?

Replies are listed 'Best First'.
Re: Data store and threads
by zentara (Cardinal) on Feb 11, 2010 at 16:27 UTC
Re: Data store and threads
by jethro (Monsignor) on Feb 11, 2010 at 15:22 UTC

    Is 2kk+ a typo or do you mean 2M+ entries? I guess the second, because 2k doesn't sound like "lots".

    Generally if you can keep the data in memory it will be a lot faster than anything you can read from disk. But depending on what searches you need to perform (for example searches for single words without wildcards) it might make more sense to store the data to disk and have a suitable search index (for example as a simple hash) in memory instead

    Fuzzy searches with wildcards and finding 'mile' when you type in 'miles' need more effort, there are packages that allow this.

      Yes, 2M. I can't store all data in memory. Data is like:
      412415
      535236642
      32523
      
      I just need to check presence of certain digits in this db.
        Yes, 2M. I can't store all data in memory.

        Why not? A hash containing 2 million keys takes around 65 MB of memory:

        undef $h{ 0+ int rand 2**32 } for 1 .. 2e6;; print scalar keys %h;; 1999519 print total_size \%h;; 67353411
        assuming that i need to write/search through this data rapidly from several threads

        Can you say a little more about what you are doing with this data?

        What would you be writing to it?

        Do changes need to persist beyond the life of the program?

        How many threads?

        How rapidly?

        I just need to check presence of certain digits in this db.

        Do you mean numbers rather than "digits"?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.