in reply to Re: need for speed - how fast is a hash
in thread need for speed - how fast is a hash

Hashes are fast until they fill up.

If a Hash is filled up beyond a certain level, it is automatically extended to a bigger size. That doesn't make it slow (it just makes one growing operation slow).

If it doesn't fit into memory anymore, it becomes slower (of course), but then it's due to size, not due to filling up. All other data structures suffer from the same problem.

The fastest way (believe it or not...) to detect and to remove duplicates is to do exactly what people were doing with large-to-them datasets before computers existed. Namely, “sort the file,” using an on-disk sort.

I'm curious, why should sorting a file on disc be faster than building a hash map on disc? Better optimization for sequential access comes to mind, but it's still O(N) vs. O(N log N)

  • Comment on Re^2: need for speed - how fast is a hash