in reply to Re: Speeding a disk based hash
in thread Speeding a disk based hash

From your comments on access time, it doesn't sound as though I can greatly speed things with this disk-based approach. This is what I suspected, though I had hoped to find wisdom pointing me to another solution.

Thanks.

Replies are listed 'Best First'.
Re^3: Speeding a disk based hash
by tachyon (Chancellor) on Oct 11, 2004 at 03:19 UTC

    Well despite my asking you still don't really supply useful detail. Probably Out Of Memory error at 950MB with 14GB free RAM is appropriate given you say you have a lot of memory.

    Now this is total speculation but you call it an intermediate step. This makes me think you are either doing a merge or a filter based on the content of the hash. Either case can be dealt with by using a merge sort strategy. If the data in your hash is stored in a sorted flat file (sorted by hash key) and the data it is to be merged with/filtered against is similarly sorted then you can very efficiently make a single pass theough both files in lockstep, generating a final output file. The basic algorithm is to open both files, and read a line from each. If the keys are the same do a merge and output, if not read another line from the file where the key < other_file_key. Thus you rapidly walk both files and find all mathcing keys.

    cheers

    tachyon

Re^3: Speeding a disk based hash
by gmpassos (Priest) on Oct 11, 2004 at 02:58 UTC
    How about use another DB to hold this HASH? Just create a table with a simple DB like SQLite, and see if you can win some speed with that.

    Graciliano M. P.
    "Creativity is the expression of the liberty".