keymon has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to build a rather large read-only database with millions of keys. The MySQL/Postgres/etc. route is not available to me; I need to do this entirely through Perl. So I decided on GDBM_File for this. Now, it will let me do
 tie %hash, 'GDBM_File', $filename, &GDBM_WRCREAT, 0640;
The problem is, the keys are sorted and the script gets bogged down rearranging data on disk as it is being built (from looking at the output of top, it is perpetually on disk wait. it doesn't help that the disk is a slow IDE drive :( ). My question is: is it possible to build the database in memory, and then just write it out in one fell swoop?

Update: A simple little flag made a _WORLD_ of difference. We're talking a 1000x increase in performance here. I added &GDBM_FAST to the tie flags, and this thing flies now:

 tie %hash, 'GDBM_File', $filename, &GDBM_WRCREAT|&GDBM_FAST, 0640;
Thank you everyone for replying; even if your replies didn't solve the problem, I am grateful that you took the time to respond. :-)

Replies are listed 'Best First'.
Re: GDBM_File to memory?
by BrowserUk (Patriarch) on Dec 23, 2005 at 01:00 UTC

    You could create it on a ram drive then copy it to disk.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: GDBM_File to memory?
by tirwhan (Abbot) on Dec 23, 2005 at 07:00 UTC

    Hmm, this sounds very odd to me. Any OS worth its salt should be caching the hot data in RAM so that the access speed of the disk does not totally bog the process down. So either you have too little RAM for this task or your OS isn't doing it's job properly. Or you've got the filesystem mounted in sync mode which forces it to constantly write changes back to the disk

    Which OS is this (and if it's a *NIX, what filesystem with what mount options are you using)?.


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      It's a RAID5 disk. As I mentioned in the update above, not having the GDBM_FAST flag made it go to disk for each and every operation. On a RAID5, this is a killer. I added the flag, and the run time went from days to a couple of minutes :)
Re: GDBM_File to memory?
by perrin (Chancellor) on Dec 23, 2005 at 04:40 UTC
    It sounds like you're looking for Storable, not a dbm file.
      I can't use Storable because the DB has to be read by other programs (in C, CPP, etc.)