in reply to Re^12: In-place sort with order assignment (runs)
in thread In-place sort with order assignment

I believe that O(0.5 * (N + U) * log N) is a good approximation of the complexity of a combined mergesort-unique algorithm.

The logic behind is that there are log N merge steps to perform. In the lower steps the probability of duplicates is very low, so the number of comparisons will be proportional to N. On the other hand, on the high merge steps, the probability of duplicates is very high and so the number of comparisons will be proportional to U.

We can optimistically assume that the mean number of operations per step is (O+N)/2, so the total number of operations becomes proportional to (O+N) / 2 * log N

And obviously, that can be simplified to O(N*log N).

Replies are listed 'Best First'.
Re^14: In-place sort with order assignment (runs)
by BrowserUk (Patriarch) on Sep 22, 2010 at 10:30 UTC
    And obviously, that [ O(0.5 * (N + U) * log N)] can be simplified to O(N*log N).

    And that, (as I've noted here before), the trouble with big-O. It is such a blunt instrument.

    The moment you try to use it to analyse a particular variation of an algorithm in detail, some bright spark will conclude that your efforts are wrong because your detail reduces to some blunt canonical form.

    But, suggest that the variation is no different (better) than the classic algorithm, because they have the same big-O canonical reduction, and that same bright spark will tell you that you have to look in detail.

    And they'll start throwing Ds instead of Ns into the mix, but then hoist you by their petard for suggesting there might or might not be some different between D & N.

    It is obvious that tye's mergesort-unique algorithm will be more efficient than a standard mergesort on data with a high degree of duplication. The fact that in the general case across all datasets, they both reduce to the same big-O formula just goes to show what a nonsense big-O is.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^14: In-place sort with order assignment (runs)
by JavaFan (Canon) on Sep 22, 2010 at 09:27 UTC
    I believe that O(0.5 * (N + U) * log N) is a good approximation of the complexity of a combined mergesort-unique algorithm.
    While O(0.5 * (N + U) * log N) isn't incorrect, given that 0 <= U <= N, O(0.5 * (N + U) * log N) and O(N log N) are the same. (That is, any function that's in O(0.5 * (N + U) * log N) is also in O(N log N) and visa versa.)
    The logic behind is that there are log N merge steps to perform. In the lower steps the probability of duplicates is very low, so the number of comparisons will be proportional to N.
    Actually, for O(N log U) to be different from O(N log N), U must be o(N). That is, even if only 1 in 1000 elements is unique, O(N log U) is equivalent with O(N log N) (after all log U == log(N/1000) == log(N) - log 1000). So, for a set where O(N log U) is different from O(N log N), the chances of two random elements to be the same is actually pretty high.

    I think U should even be o(Nε) for all ε > 0.

      O(0.5 * (N + U) * log N) and O(N log N) are the same

      That's exactly what the last sentence in my previous post said!

      So, for a set where O(N log U) is different from O(N log N), the chances of two random elements to be the same is actually pretty high.

      Only for very degraded cases where there is an element that appears with probability near to 1.

        That's exactly what the last sentence in my previous post said!
        It was totally unclear to me that your last sentence referred to the first sentence of your post, and not to the sentence preceeding it.