in reply to Increasing CPU Usage/Decreasing Run Time

Sounds like your accessing a large volume of data through a small buffer (thereby forcing lots of IO)?

What options are you using when you create/open your DB?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
  • Comment on Re: Increasing CPU Usage/Decreasing Run Time

Replies are listed 'Best First'.
Re^2: Increasing CPU Usage/Decreasing Run Time
by NathanE (Beadle) on Jul 25, 2005 at 22:25 UTC
    Yeah - that would make sense as well. The only thing that contradicts that is that I'm using tied hashes. I've untied the actual database from them and let them run just as plain hashes, and still encountered the same problem.

    Here's how I open the BerkeleyDB hash tables though;
    tie %$file, "BerkeleyDB::Hash",
    -Filename => $file,
    -Flags => DB_CREATE
    or die "Cannot open file\n" ;
    ############ UPDATE #############

    I just re-ran, again, with those hashes untied. That's exactly what the problem is. It's taking too much IO to the hash table. Any idea how to speed this up? I'm thinking about a process that dumps to the hash table after all the hard processing is done.

    Comments/Suggestions?

      I've used DB_File rather than BerkeleyDB::Hash, but I assume that the options available are similar. They are mentioned for the different DB types here as a part of the DB_File docs. I would assume that you are using what DB_File refers to as a DB_File::HASHINFO. The parameters you probably need to consider varying are the cachesize, bsize & ffactor.

      However, the DB_File docs gives no information on how to vary these options for performance. Eg. An ever bigger cache does not always render better performance.

      Optimising the options requires a fairly keen understanding of the nature of your data and the usage patterns of your application.

      I did find that the documentation here, particularly section 2, was useful, but be prepared for doing a lot of experimentation.

      The best guide I found to performance tuning was this page. Unfortunately, much of the advice relies upon your having access to one or more of the Berkeley DB utilities, which I never located for Win32. None the less, the information on that page proved very useful as a guide to some trial & error testing.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      perrin has written some great stuff on optimizing BerkeleyDB cache sizes and other IO related stuff. I'd suggest doing a Super Search on stuff written by perrin that also uses the word BerkeleyDB. There's a bunch in there and I'm not going to pre-filter it for you right now.