Re: fast lookups in files

Since you know that the large-file is already sorted, the most efficient processing technique would be to sort the other input file(s) by the same key. Then, you can process the two streams side-by-side sequentially: no “search” is involved.

If it is at all possible to do this, then this is what you should do. “Inconvenience yourself” to do it this way: you'll be glad you did.

You do not have to worry about updating the original file. Write the changes to another file in the same format, then sort that file by the same key, then merge them into the original file. In each case, you're doing the job by means of sorts (which are unexpectedly fast), and sequential reads. When you're finished, you'll have the original input master-file (unchanged), the delta-file (now sorted), and the updated master-file.

Yes, that is exactly how data-processing was done, using punched cards, long before digital computers were invented... And it worked.

Failing that, an appropriate strategy would be to use something like a DB_File i(e.g./i a Berkeley DB) ... but beware. Random seeks are time-consuming in large quantities.

Replies are listed 'Best First'.
Re^2: fast lookups in files by BrowserUk (Patriarch) on Feb 06, 2008 at 16:17 UTC
... the most efficient processing technique would be to sort the other input file(s) by the same key. What other input files? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]

Replies are listed 'Best First'.

Re^2: fast lookups in files
by BrowserUk (Patriarch) on Feb 06, 2008 at 16:17 UTC

... the most efficient processing technique would be to sort the other input file(s) by the same key.

What other input files?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]