in reply to Re: Increasing CPU Usage/Decreasing Run Time
in thread Increasing CPU Usage/Decreasing Run Time

Yeah - that would make sense as well. The only thing that contradicts that is that I'm using tied hashes. I've untied the actual database from them and let them run just as plain hashes, and still encountered the same problem.

Here's how I open the BerkeleyDB hash tables though;
tie %$file, "BerkeleyDB::Hash",
-Filename => $file,
-Flags => DB_CREATE
or die "Cannot open file\n" ;
############ UPDATE #############

I just re-ran, again, with those hashes untied. That's exactly what the problem is. It's taking too much IO to the hash table. Any idea how to speed this up? I'm thinking about a process that dumps to the hash table after all the hard processing is done.

Comments/Suggestions?

  • Comment on Re^2: Increasing CPU Usage/Decreasing Run Time

Replies are listed 'Best First'.
Re^3: Increasing CPU Usage/Decreasing Run Time
by BrowserUk (Patriarch) on Jul 25, 2005 at 22:57 UTC

    I've used DB_File rather than BerkeleyDB::Hash, but I assume that the options available are similar. They are mentioned for the different DB types here as a part of the DB_File docs. I would assume that you are using what DB_File refers to as a DB_File::HASHINFO. The parameters you probably need to consider varying are the cachesize, bsize & ffactor.

    However, the DB_File docs gives no information on how to vary these options for performance. Eg. An ever bigger cache does not always render better performance.

    Optimising the options requires a fairly keen understanding of the nature of your data and the usage patterns of your application.

    I did find that the documentation here, particularly section 2, was useful, but be prepared for doing a lot of experimentation.

    The best guide I found to performance tuning was this page. Unfortunately, much of the advice relies upon your having access to one or more of the Berkeley DB utilities, which I never located for Win32. None the less, the information on that page proved very useful as a guide to some trial & error testing.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re^3: Increasing CPU Usage/Decreasing Run Time
by diotalevi (Canon) on Jul 26, 2005 at 03:36 UTC
    perrin has written some great stuff on optimizing BerkeleyDB cache sizes and other IO related stuff. I'd suggest doing a Super Search on stuff written by perrin that also uses the word BerkeleyDB. There's a bunch in there and I'm not going to pre-filter it for you right now.