in reply to storing a large hash in a database

What does your data structure do, exactly? I'm not understanding the logic, and a good grasp of the logic is needed before a database structure can be suggested. Can you give a few examples of your dictionary data and what you're searching against it with?

Replies are listed 'Best First'.
Re^2: storing a large hash in a database
by Anonymous Monk on Dec 02, 2011 at 17:01 UTC
    the dictionary file structure is like this:
    text ||| text ||| number,number,number,number,number
    while I care only about the texts and the third/first number. The dictionary has millions of lines. If I can read it once and write it to the physical disc as a database, I don't need to load it each time in the memory and will do my search with SQL queries.

      It really depends on your data. It appears that you take the first two fields (I'll call them text1 and text2) and use one as the key to a hash, pushing the other onto an array which is the value of that key, and which is the key and which goes onto its array depends on whether a field later in the line is greater than some parameter you supply.

      So you end up with a hash of arrays. Since your hash keys are unique, they could be one field in your database. But we don't know what ends up in your arrays. Do they all have the same length? Are they consistent in what they hold? To map your hash to a SQL table in a one-to-one fashion, the hash values all need to be structured the same.

      As an example, let's say your file looks like this:

      A ||| Adam ||| 1,2,3,4,5 B ||| Bob ||| 1,2,3,4,5 B ||| Brian ||| 1,2,3,4,5 Carl ||| C ||| 3,2,1,4,5 Bruce ||| B ||| 3,2,1,4,5 C ||| Cheryl ||| 1,2,3,4,5

      And you call your sub with a $p value that causes the single-letter values to be used as the keys, with the longer texts pushed onto them as arrays. You'll end up with this hash:

      %hash = ( A => [ 'Adam' ], B => [ 'Bob', 'Brian', 'Bruce' ], C => [ 'Carl', 'Cheryl' ]);

      Here, your hash data doesn't "line up" with columns in a database table, since you have different numbers of fields for each one. So in this case, you'd either need to concatenate your arrays into a single "name" field with a delimiter, or have multiple records for each key. One of the following layouts:

      letter names ------------------------ A Adam B Bob;;Brian;;Bruce C Carl;;Cheryl letter name ------------------------ A Adam B Bob B Brian B Bruce C Carl C Cheryl

      Generally, the second method is best, because it simplifies searches and saves the cost of splitting on the delimiter every time you need a name. But it depends a lot on what your actual data looks like. Your first step should be to think about your data and design a table for it, and then figure out how to insert the values. It doesn't matter that you're putting it in a hash now; think about how it should be structured in terms of set columns and go from there.

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.

      Need to see examples of the data. What text? What number formats? What does it represent? What are you searching against it with? If we're left in the dark it's a waste of time to go crawling around randomly.