in reply to Re^5: performance issues with grepping large sorted arrays
in thread performance issues with grepping large sorted arrays

Indeed, if there's no other reason to have the array sorted than to do a binary search instead of a short-circuited linear search, there may not be much overall difference between the n/2 vs. the n * log2(n) + log2(n).

In this case, we're talking about m*n, so O(m*n/2) vs. O( n * log2(n) + m * log2(n) ). (1000 * 400k) for short-circuit linear with no sort or (800k * 20 + 1000 * 20) for sort and binary search. 400,000,000 vs. 16,020,000 typical. Although it could be worst case 800,000,000 vs 640,000,020,000.

So between the two, there's better typical time on the quicksort and binary search, but more time stability on the short-circuited linear search with no sort. Pathological cases would hopefully be rare.

All of which is still, in this case, much ado about nothing. There should be 800,000 hash lookups for comparison and no searching. The method with 0.002 as many comparisons as one or 0.05 as many comparisons as the other is the clear winner.
  • Comment on Re^6: performance issues with grepping large sorted arrays