Re: Re: Berkeley DB performance, profiling, and degradation...

It doesn't seem strange to me, particularly because DB_File is usually version 1.x of the Berkeley DB, which is a ways behind the advances of version 3.x, or even 4.x of the Berkeley DB.

There is an alternative interface, BerkeleyDB.pm, which is distributed with the latest versions of Berkeley DB, the ones which support terabytes and terabytes of data.

merlyn also hinted at the alternative interface ... it is a know fact that tie is somewhat slow.

I haven't been able to try it yet, due to various problems I've been having w/my compiler setup on winblows ... I'd be interested to see how drastic the improvements are ;)

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Comment on Re: Re: Berkeley DB performance, profiling, and degradation...

Replies are listed 'Best First'.
Re: Re: Re: Berkeley DB performance, profiling, and degradation... by perrin (Chancellor) on Feb 19, 2002 at 18:29 UTC
I say it's surprising because a hash algorithm is supposed to maintain a fairly constant lookup time when you put more data into it. Maybe switching between the hash and BTree options of DB_File would make a difference. I have used BerkeleyDB with the 3.x series from Sleepycat pretty extensively. The main advantages it offers are in the area of fancier locking and caching. With a single writer and the data on a RAM disk, these aren't likely to make much difference. It's worth a shot though.	[reply]
Re: Re: Re: Re: Berkeley DB performance, profiling, and degradation... by SwellJoe (Scribe) on Feb 19, 2002 at 22:46 UTC
This was my assumption as well (that lookups should be roughly constant at some point). But clearly it is not so. I've already tried switching to BTREE with no measurable result--I think having the db in RAM nullifies all of the tweaks that are available (like cachesize, etc.). One thing I have thought of, which might be helpful, is that I already have a hash value which is my key in the database. As I understand it, the Berkeley DB then creates a new hash derived from my key to store the object. Any chance I could use my own hashes as record numbers or similar? (The hash I have for a key is a 32 byte MD5, which matches the Squid hash key for a given object.) Would avoid the key generation part of the STORE and FETCH. Might not be a benefit though...Will worry more about it if SDBM_File doesn't fix my problems.	[reply]