in reply to Re: fast lookups in files
in thread fast lookups in files

Scan the file and build a hash of (key,file position) pairs so you can immediately seek to the line you want.

Well, if a hash of key => values doesn't fit into memory I 'm afraid that a hash with keys => positions will not fit either.

Alternatively, sort the file and use a binary search to locate the line. This way, you'd need roughly log2(line_count) seek/reads to get the data you're looking for.

That would be a better solution (the file is already sorted)

Thanks

citromatik

Replies are listed 'Best First'.
Re^3: fast lookups in files
by apl (Monsignor) on Feb 05, 2008 at 13:35 UTC
    You could have an array where subscript N indicates key N thousand, and the value of the array element would be the disk offset to look at.

    Let's assume you have fixed length records of 30 bytes. Element 21 having value 999990 would mean the first key greater than or equal to 21,000 was stored at disk offset 999,990.

    Having said all that, I'd also suggest using a binary search.