in reply to Re^11: Rapid text searches ( O(log n) time)
in thread Rapid text searches

Excellent, your _really_ good! 8 )

Actually I never questioned your code just your headlines and assertions...

It took much time and research to post retrieved and logical arguments in your language. Maybe you noticed references and links to further readings I always added?

Anyway if time allows I'll be pleased to post a stress test for your approach in the coming days. (update DONE!!!)

BTW: you didn't reveal how you generated your test file ...

Cheers Rolf

UPDATE: Parole,parole,parole - Italian 8)

  • Comment on Re^12: Rapid text searches ( O(log n) time)

Replies are listed 'Best First'.
Re^13: Rapid text searches ( O(log n) time)
by BrowserUk (Patriarch) on Jun 10, 2009 at 09:36 UTC
    Actually I never questioned your code just your headlines and assertions...

    And what is it about my responses:

    • The headline claim for O(1) time is based on comparison with Perl's hashes which also do a linear search down a bucket chain to resolve hash clashes, but are generally rated as O(1).
    • It was a headline, not a formal thesis. The absence of any further proof or discussion, combined with the "crude demo code", should be a strong indication of the lack of formalism.

    that does not satisfy you that the mention of O(1) was never more than a shorthand for saying "quickly and efficiently"--just "a headline, not a formal thesis."? It's purpose was soley to indicate that there are alternatives to using a database or a linear search from the top of the file.

    Your own links show that--when coded properly, not just "crude demo code";--a dictionary sort can even outstrip a binary search O(log log N) versus O(log N).

    For 38e6 (estimate) records, that's

    c:\test>perl -E"say log 38e6" 17.4530967176907 c:\test>perl -E"say log log 38e6" 2.8595170952357

    Sufficient potential improvement to warrent bringing that potential to the attention of the OP. And sufficiently closer to O(1) than O(N) to warrent the informal use of the former in preference to the latter in characterising the algorithm.

    Is it possible to improve the crude implementation I posted to render a better than O(log N) efficiency? Unless the OP responds and gives us access to one or more realistic copies of his data--we'll never know for sure. Which is just repeating everything I said in the last three paras of 769241.

    And that, as the saying goes, is all she wrote. In the absence of any further feedback from the OP about the true nature of his data, further speculation serves no other purpose than a pissing contest.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Sadly regarding complexity the English article is not on the level of the German one. And both bring nearly no references!!! :-(

      Eine Untersuchung der Interpolationssuche erweist sich als sehr komplex, als Laufzeit kann jedoch O(log(log(n))) (n ist die Anzahl der Elemente) im durchschnittlichen Fall angenommen werden. Im ungünstigsten (die interpolierte erwartete Position ist immer am Rand) Fall beträgt die Laufzeit allerdings O(n).

      Furthermore as already said in my first reply your alleged O(1) code breaks interpolation search just after the first step!

      The other points your starting again are already fully answered.

      Cheers Rolf

      UPDATE: Palabras,palabras,palabras - Spanish 8)

        Okay, I speak no German beyond the obligatory tourists 10; but let me see if I can help you out here. One translation of that passage you cite is:

        An investigation of the interpolation search proves as very complex, as running time can however O(log(log(n))) (n is the number of elements) in the average case be accepted. In the most unfavorable (the interpolated expected position always is at the edge) case the running time amounts to however O(n).

        And If I interpolate that(1): The analysis of the interpolation search is very complex. It can achieve O(log log N) in the average case, but in the worst case it can approach O(N).

        By your own reference, the analysis is very complex. Do you wonder that I did not do such analysis?

        All I knew at the time of my original posting, is that on a previous occasion, (8 or 10 years ago), for a similar application, I had coded a dictionary search which--by trading calculation complexity for costly IO--beat out a both a standard binary search and a modified binary search. That was real code on real data, quite possibly still running helping the belastingdienst collect duty and taxes and 21 other functions.

        I had until this discussion never heard of an "interpolation search" (what a stupid name!), nor had I any idea of it's "theoretical efficiency"--nor cared to know.

        So, in the absence of that knowledge, but knowing it could beat O(log N), I said: O{1} instead of "better than O(log N)". Big deal. Get over it.

        As for my demo code "breaks interpolation search just after the first step!". I didn't set out to code an "interpolation search", nor even a piece of "certified code". Just to demonstrate the possibility. Eg. a starting point not an end point. It was imperfect as posted. Once again, get over it.

        This reminds me of the kids that would complain about the "offside rule having been violated" during schoolyard games of 15-or-16-per-side.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
          A reply falls below the community's threshold of quality. You may see it by logging in.