Re: Efficiency of seek

You can also learn a lesson from all those spinning magnetic tapes, and all those punched cards, in campy science-fiction movies.

In the years before computers existed, data was processed very successfully using the strategy of sorting one or more data streams according to the same key, after which it could be processed by one sequential run through one-or-more files in parallel. Even when you are using an indexed file, much the same benefit can be obtained by sorting the file of keys that are being sought.

When you know that you are dealing with sorted files, you know three things to be true:

All occurrences of any given key-value will be adjacent.
If any gaps occur in the sequence, no records with those key-values exist anywhere.

External sorting operations are among the most-studied algorithms, and they are exceptionally efficient. But notice that I said, external sorting. “Memory” is virtual, hence it is actually a disk file, and virtual memory paging results (more or less) in random seeks. “In-memory” operations, including hashing and lists, are considerably less efficient than external sorting based algorithms when “thrashing” begins in earnest – by several orders of magnitude.