Re^2: Speeding a disk based hash

From your comments on access time, it doesn't sound as though I can greatly speed things with this disk-based approach. This is what I suspected, though I had hoped to find wisdom pointing me to another solution.

Thanks.

Comment on Re^2: Speeding a disk based hash

Replies are listed 'Best First'.
Re^3: Speeding a disk based hash by tachyon (Chancellor) on Oct 11, 2004 at 03:19 UTC
Well despite my asking you still don't really supply useful detail. Probably Out Of Memory error at 950MB with 14GB free RAM is appropriate given you say you have a lot of memory. Now this is total speculation but you call it an intermediate step. This makes me think you are either doing a merge or a filter based on the content of the hash. Either case can be dealt with by using a merge sort strategy. If the data in your hash is stored in a sorted flat file (sorted by hash key) and the data it is to be merged with/filtered against is similarly sorted then you can very efficiently make a single pass theough both files in lockstep, generating a final output file. The basic algorithm is to open both files, and read a line from each. If the keys are the same do a merge and output, if not read another line from the file where the key < other_file_key. Thus you rapidly walk both files and find all mathcing keys. cheers tachyon	[reply]
Re^3: Speeding a disk based hash by gmpassos (Priest) on Oct 11, 2004 at 02:58 UTC
How about use another DB to hold this HASH? Just create a table with a simple DB like SQLite, and see if you can win some speed with that. Graciliano M. P. "Creativity is the expression of the liberty".	[reply]