in reply to Hash Search is VERY slow

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^2: Hash Search is VERY slow
by Anonymous Monk on Sep 29, 2021 at 17:44 UTC
    According to Knuth's seminal book Sorting and Searching, an external merge sort has a complexity of O(n log(n)) and this will hold true for any data volume: it will never "hit the wall." Once the data has been [externally ...] sorted in this way, it now becomes trivial to know which URL-keys occur and also to know how many instances exist of each distinct value: a simple sequential read of the sorted file will tell you all of this at once.

      oh, look. the monkey can copy text out of a book.