http://qs1969.pair.com?node_id=11137116


in reply to Hash Search is VERY slow

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^2: Hash Search is VERY slow
by Anonymous Monk on Sep 29, 2021 at 17:44 UTC
    According to Knuth's seminal book Sorting and Searching, an external merge sort has a complexity of O(n log(n)) and this will hold true for any data volume: it will never "hit the wall." Once the data has been [externally ...] sorted in this way, it now becomes trivial to know which URL-keys occur and also to know how many instances exist of each distinct value: a simple sequential read of the sorted file will tell you all of this at once.

      oh, look. the monkey can copy text out of a book.