in reply to Re^12: Rapid text searches ( O(log n) time)
in thread Rapid text searches

Actually I never questioned your code just your headlines and assertions...

And what is it about my responses:

that does not satisfy you that the mention of O(1) was never more than a shorthand for saying "quickly and efficiently"--just "a headline, not a formal thesis."? It's purpose was soley to indicate that there are alternatives to using a database or a linear search from the top of the file.

Your own links show that--when coded properly, not just "crude demo code";--a dictionary sort can even outstrip a binary search O(log log N) versus O(log N).

For 38e6 (estimate) records, that's

c:\test>perl -E"say log 38e6" 17.4530967176907 c:\test>perl -E"say log log 38e6" 2.8595170952357

Sufficient potential improvement to warrent bringing that potential to the attention of the OP. And sufficiently closer to O(1) than O(N) to warrent the informal use of the former in preference to the latter in characterising the algorithm.

Is it possible to improve the crude implementation I posted to render a better than O(log N) efficiency? Unless the OP responds and gives us access to one or more realistic copies of his data--we'll never know for sure. Which is just repeating everything I said in the last three paras of 769241.

And that, as the saying goes, is all she wrote. In the absence of any further feedback from the OP about the true nature of his data, further speculation serves no other purpose than a pissing contest.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^14: Rapid text searches ( O(log n) time)
by LanX (Saint) on Jun 10, 2009 at 10:46 UTC
    Sadly regarding complexity the English article is not on the level of the German one. And both bring nearly no references!!! :-(

    Eine Untersuchung der Interpolationssuche erweist sich als sehr komplex, als Laufzeit kann jedoch O(log(log(n))) (n ist die Anzahl der Elemente) im durchschnittlichen Fall angenommen werden. Im ungünstigsten (die interpolierte erwartete Position ist immer am Rand) Fall beträgt die Laufzeit allerdings O(n).

    Furthermore as already said in my first reply your alleged O(1) code breaks interpolation search just after the first step!

    The other points your starting again are already fully answered.

    Cheers Rolf

    UPDATE: Palabras,palabras,palabras - Spanish 8)

      Okay, I speak no German beyond the obligatory tourists 10; but let me see if I can help you out here. One translation of that passage you cite is:

      An investigation of the interpolation search proves as very complex, as running time can however O(log(log(n))) (n is the number of elements) in the average case be accepted. In the most unfavorable (the interpolated expected position always is at the edge) case the running time amounts to however O(n).

      And If I interpolate that(1): The analysis of the interpolation search is very complex. It can achieve O(log log N) in the average case, but in the worst case it can approach O(N).

      By your own reference, the analysis is very complex. Do you wonder that I did not do such analysis?

      All I knew at the time of my original posting, is that on a previous occasion, (8 or 10 years ago), for a similar application, I had coded a dictionary search which--by trading calculation complexity for costly IO--beat out a both a standard binary search and a modified binary search. That was real code on real data, quite possibly still running helping the belastingdienst collect duty and taxes and 21 other functions.

      I had until this discussion never heard of an "interpolation search" (what a stupid name!), nor had I any idea of it's "theoretical efficiency"--nor cared to know.

      So, in the absence of that knowledge, but knowing it could beat O(log N), I said: O{1} instead of "better than O(log N)". Big deal. Get over it.

      As for my demo code "breaks interpolation search just after the first step!". I didn't set out to code an "interpolation search", nor even a piece of "certified code". Just to demonstrate the possibility. Eg. a starting point not an end point. It was imperfect as posted. Once again, get over it.

      This reminds me of the kids that would complain about the "offside rule having been violated" during schoolyard games of 15-or-16-per-side.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        kann angenommen werden= can be supposed, assumed

        and this only for a very complex "average case"! So without references it's hearsay.

        Anyway it's certainly not O(1), which (AGAIN) means fixed complexity, no matter how big n gets. And this algo isn't even stable and can go havoc up to O(n).

        Your playing around with a "hammer" of other peoples toolbox, take their advice and better don't touch ... you might hurt yourself.

        Cheers Rolf

        UPDATE: Palavra, palavra, palavra - Turkish 8)

        A reply falls below the community's threshold of quality. You may see it by logging in.