in reply to storing a large hash in a database

I had a process with a hash that grew too big for RAM, and tied it to disk with a few simple lines:

use DB_File; # too big for RAM now my $db = '/path/to/some.db'; my %hash; unlink $db; # to start fresh tie (%hash, "DB_File", $db, $DB_BTREE) or die("Unable to tie $db: $!") +; # do things as before untie %hash;

Performance is fine.

Replies are listed 'Best First'.
Re^2: storing a large hash in a database
by elef (Friar) on Jun 13, 2013 at 11:50 UTC
    Found this through Google and it helped me filter dupes out of massive files (the %seen hash got too big for memory). Thanks!
Re^2: storing a large hash in a database
by Anonymous Monk on Dec 02, 2011 at 17:10 UTC
    Interesting, may I know if the database would be deleted after the program ends or it will be kept in the physical memory. In the case that it is deleted can I force it to be in physical memory without deleting it?

      It is not in memory. And if you don't unlink the file, it remains on disk; and if you re-tie, you have access to it.

      In the snippet I gave, I unlink before tie'ing; because for my application, I want to start anew every time.

Re^2: storing a large hash in a database
by Anonymous Monk on Dec 03, 2011 at 15:25 UTC
    With your great advise, I could do it with DB_File. Now I have a DB_file (*.db) in my hard disk which is about 5 GB, However, I can not re-access it.
    $DB_BTREE->{'flags'} = R_DUP ; my $x = tie %dictf, "DB_File", "f.db", O_RDWR|O_CREAT, 0666, $DB_BTREE + or die "Cannot open file f.db: $!\n"; my $number = keys %dictf; print "$number\n"; my $number = keys %dictf; print "$number\n"; $key = $value = 0 ; for ($status = $x->seq($key, $value, R_FIRST) ; $status == 0 ; $status = $x->seq($key, $value, R_NEXT) ) { print "$key -> $value\n"; }
    and the result is:
    0
    while I have:
    -rw-r--r-- 1 xxxxx xxx 4.2G Dec 3 14:03 f.db
    as my database. What do I do wrong?

      Try tie'ing without O_CREAT:

      tie %dictf, "DB_File", "f.db", O_RDWR, 0666, $DB_BTREE or die "Cannot +open file f.db: $!\n";

      So, in hindsight, my unlink was unnecessary. And note that the default option is O_CREAT|O_RDWR, so you need to be explicit if you don't want to wipe it out.

Re^2: storing a large hash in a database
by Anonymous Monk on Dec 02, 2011 at 16:56 UTC
    Great but one question is that if I want to do this when you tie hash then it will be written in the local memory? Because reading my file into hash is really RAM consuming.

      Did you read the previous answer or the DB_File documentation?

      DB_File stores the data on disk using the Berkeley DB library, and it uses tie so that it looks like your data was stored in an ordinary hash.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Short answer is no. Try it and see!