in reply to A memory efficient hash, trading off speed - does it already exist?
I'm surprised caching has not been mentioned...
Vocabularies have high occuring words, and low occuring words. It's safe to say that should you keep the 5,000-10,000 in memory, reading the rest of the words from disk as needed you should barely have a noticeable speed decrease.
The way to implement such caching is called Least Recently Used expiration. You could use a cache on top of one of the before mentioned DBM modules. DB_File is my personal favourite, as it is flexible. BTrees are a bit more space efficient, but have a search time of around logx * N (N is the number of keys, x is the search time in a node), while hashes, taking up more space at times, have a typical search time of O(1), but may go as far as O(N). I would reccomend useing one of the modules in the Cache:: namespace on cpan, or searching for LRU, and layering a cache on a DB_File BTree.
I have made a simple and relatively fast caching layer that accept a hash reference (tied hashes, i guess) and a limit for arguments, and maintains an O(1) expiration && storage time, albeit not necessarily filling the whole of the limit set on the memory hash. If you would like to view the source code I would be more than happy to share.
Update: I forgot to mention that DB_File needs to be layered under MLDBM, and since freezing and thawing, aswell as searching, will cost time, a smaller set of hashes of hashes, cached, will yield reasonable time, with an alternative to paging.
-nuffin
zz zZ Z Z #!perl