in reply to Further optimize usage of SDBM_File

I think you've got plenty of good ideas for possible optimizations. But first, have you profiled your app? If not, break out something like Devel::DProf or Devel::Profile. These can tell you if your DB access is really your bottleneck. If it is, try making these changes and validate them by re-running the profiler, or by using Benchmark (but be sure to re-profile before declaring success!).

And just because I can't help myself, you might consider the fact that you're probably double-hashing your data. You hash something to produce an MD5 key and then SDBM re-hashes that key into an internal key. That's probably a waste of time - perhaps you could feed SDBM a natural key instead, if you have one.

-sam

  • Comment on Re: Further optimize usage of SDBM_File

Replies are listed 'Best First'.
Re^2: Further optimize usage of SDBM_File
by isync (Hermit) on Jun 14, 2007 at 19:26 UTC
    Yes, I did profile it with Devel::DProf. Sdbm is not the biggest concern, but maybe the only one left optimizable as it is the last really accessing the disk...

    Does sdbm really "re-hash (my) key into an internal key."?? The data I am hashing is about 300bytes and I did the hashing to reduce data while getting a (quite) unique key... My understanding was that sdbm would 1:1 use the supplied data as key, but if it really hashes it - I would revert to feeding it the original.. Are you sure?

    What about the "feed the data sorted"? Is there any advantage in doing so?

    And what about reducing pagesize? Ever tried?? (and can I anticipate what sdbm hashes it to? Sorting senseless??)
      Does sdbm really "re-hash (my) key into an internal key."?? The data I am hashing is about 300bytes and I did the hashing to reduce data while getting a (quite) unique key... My understanding was that sdbm would 1:1 use the supplied data as key, but if it really hashes it - I would revert to feeding it the original.. Are you sure?

      How could it implement a hash table without hashing the keys? I'm no SDBM expert, but this leads me to believe my guess is correct:

      http://www.partow.net/programming/hashfunctions/#SDBMHashFunction

      I can't answer your more specific performance questions. I doubt anyone can, with the possible exception of the people who wrote SDBM. I suggest you setup some benchmarks and try it!

      -sam