Re: A memory efficient hash, trading off speed

I'm surprised caching has not been mentioned...

Vocabularies have high occuring words, and low occuring words. It's safe to say that should you keep the 5,000-10,000 in memory, reading the rest of the words from disk as needed you should barely have a noticeable speed decrease.

The way to implement such caching is called Least Recently Used expiration. You could use a cache on top of one of the before mentioned DBM modules. DB_File is my personal favourite, as it is flexible. BTrees are a bit more space efficient, but have a search time of around logx * N (N is the number of keys, x is the search time in a node), while hashes, taking up more space at times, have a typical search time of O(1), but may go as far as O(N). I would reccomend useing one of the modules in the Cache:: namespace on cpan, or searching for LRU, and layering a cache on a DB_File BTree.

I have made a simple and relatively fast caching layer that accept a hash reference (tied hashes, i guess) and a limit for arguments, and maintains an O(1) expiration && storage time, albeit not necessarily filling the whole of the limit set on the memory hash. If you would like to view the source code I would be more than happy to share.

Update: I forgot to mention that DB_File needs to be layered under MLDBM, and since freezing and thawing, aswell as searching, will cost time, a smaller set of hashes of hashes, cached, will yield reasonable time, with an alternative to paging.

-nuffin
zz zZ Z Z #!perl

Comment on Re: A memory efficient hash, trading off speed - does it already exist?