note
Anonymous Monk
According to Knuth's seminal book <i>Sorting and Searching,</i> an external merge sort has a complexity of <i>O(n log(n))</i> and this will hold true for any data volume: it will never "hit the wall." Once the data has been [externally ...] sorted in this way, it now becomes trivial to know which URL-keys occur and also to know how many instances exist of each distinct value: a simple <i>sequential</i> read of the sorted file will tell you all of this at once.
11137097
11137116
-1