Hello monks, I would like to call upon your wisdom again today.

How can I create a hash that stores it's data to a physical disk? 

I have looked over some documentation and it looks like there are may ways to do this, but I can't seem to make sense of it. One method involved writing a special perl module and defining my own functions for "store", "delete", etc. I would like to avoid that of possible. I also saw the use of tie and tie::stdhash but found them to be confusing to me. Currently I am using tie with DB_File to tie a hash to a DB file, but am having trouble inserting new data.

I have a need to store roughly 5gb of data as a hash of arrays, thus I need to not use my systems RAM. My problem comes when I attempt to push a new value onto an array. 

Is there a simple way to do this that I am missing? My code functions without writing the hash to a file, but fails when I tie the hash to a disk. The speed of read/writes on the hash is still of importance to me, although I realize writing to disk is much slower than RAM.

Here is an example of my code:
use DB_File ; my  %hash;      unlink "tempfile"; #Remove previous file, if any tie %hash, "DB_File", "tempfile", O_RDWR|O_CREAT, 0666, $DB_HASH       or die "Cannot open file 'tempfile': $!\n"; while($sourceString =~ /example(key)regex(value)example\b/ig ) {      my $key = $1;      my $value = $2;      push( @{ $hash{ $key } },  $value );  #Push the value into the ha +sh }
I understand that if I write my own handlers for "store" "delete", etc I could make the values be appended to an array each time a new value was assigned, but would like to stay away from hairy situations...  Update:

I need to store around 1000 values in each array.

Solved:

I have returned from MySQL land with a solution. Since my input data is formatted as strings, "value.key" I wrote them in bulk to a temporary file. I then used MySQLs load_data_infile function to populate a temporary table. I then used insert with combinations of MySQLs string functions to make a table with two columns: key and value. The insert function took all data in the temporary table and inserted it into the new 2-column table. Now I can "select where key equals" to emulate perls amazing hashes. Not as fast as I would like it to be now, but I can process massive input files.

Thank you monks. I accept and appreciate your wisdom.


In reply to Disk based hash (as opposed to RAM based) by techtruth

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.