in reply to Re^4: performance issues with grepping large sorted arrays
in thread performance issues with grepping large sorted arrays

The point is: ...

If you think that this is The point, could you elaborate on its practical relevance with respect to the statement I originally (implicitly) made, which is just that skipping unnecessary iterations/comparisons is faster (in the typical case) than doing full iterations every time (like grep). This holds even though the upper bound expressed in both algorithms' Big O complexity classification is the same. I never made any statement as to the method being O(N) or not. Also, no doubt that a binary search on a sorted list would be faster...

Replies are listed 'Best First'.
Re^6: performance issues with grepping large sorted arrays
by mr_mischief (Monsignor) on Jul 24, 2007 at 20:13 UTC
    Indeed, if there's no other reason to have the array sorted than to do a binary search instead of a short-circuited linear search, there may not be much overall difference between the n/2 vs. the n * log2(n) + log2(n).

    In this case, we're talking about m*n, so O(m*n/2) vs. O( n * log2(n) + m * log2(n) ). (1000 * 400k) for short-circuit linear with no sort or (800k * 20 + 1000 * 20) for sort and binary search. 400,000,000 vs. 16,020,000 typical. Although it could be worst case 800,000,000 vs 640,000,020,000.

    So between the two, there's better typical time on the quicksort and binary search, but more time stability on the short-circuited linear search with no sort. Pathological cases would hopefully be rare.

    All of which is still, in this case, much ado about nothing. There should be 800,000 hash lookups for comparison and no searching. The method with 0.002 as many comparisons as one or 0.05 as many comparisons as the other is the clear winner.