in reply to Re^2: Speeding a disk based hash
in thread Speeding a disk based hash
Well despite my asking you still don't really supply useful detail. Probably Out Of Memory error at 950MB with 14GB free RAM is appropriate given you say you have a lot of memory.
Now this is total speculation but you call it an intermediate step. This makes me think you are either doing a merge or a filter based on the content of the hash. Either case can be dealt with by using a merge sort strategy. If the data in your hash is stored in a sorted flat file (sorted by hash key) and the data it is to be merged with/filtered against is similarly sorted then you can very efficiently make a single pass theough both files in lockstep, generating a final output file. The basic algorithm is to open both files, and read a line from each. If the keys are the same do a merge and output, if not read another line from the file where the key < other_file_key. Thus you rapidly walk both files and find all mathcing keys.
cheers
tachyon
|
|---|