in reply to size on disk of tied hashes
Build your own index to the data.
Use the MD5 of the key (binary 128-bits) + the file position (64-bits) = 24bytes * ~= 500 million records.
11 GB index file.
Sort by the MD5.
With fixed length records, writing a binary chop to locate the record's offset is relatively easy and gives you ~log(n) access time.
Still pushes you beyond your 40GB disk, but 60GB disks aren't that much more exspensive.
|
|---|