I agree with you that trying to use big-O to compare the time-costs of space complexity is pretty pointless. I was simply following moritz lead with the purpose of trying to make that very point.

I will strongly disagree with your statement about big-O notation being done only in academic papers.

That's not what I said. I said: "that big-O is rarely done properly outside of academic papers, which is quite different. Feel free to disagree with what I actually said also, but you might want to read on first.

What I said is not a million miles away from your own later statement:"Complexity theory is a real treasure when we can apply it properly.".

The problem is that people will insist on trying to use the analysis of algorithm complexity to compare the time-costs of actual implementations. It simply doesn't work for a variety of reasons.

Big-O notation is useful for comparing specific algorithm for achieving specific, and usually very limited, tasks. The problem is that real world tasks tend to be more complex in that they need the application of several specific algorithms, serially or in parallel, to achieve the final goal. And trying to combine the big-O notations of all the sub algorithms, is a task often far more complex that writing the program and benchmrking it.

Finally, big-O ignores the reality that in programmer terms, writing two nested loops is no harder than writing to independant loops. And that for small numbers, the difference in run time can be negligable.

For example, the OP benchmarked my implementation (not algorithm; moritz had mentioned the possibility of it before I posted), of the vec solution against Roy Johnston's implementation, and his own, on his real-world data sets consisting of "modest sized arrays" of "non-duplicate, positive integers", and found Roy Johnstone's the best for his purposes. And that, as they say, is that.

Out of personal interest, I added Roy's implementation to my own benchmark and found that it beat mine (in it's original, non-pre-extending form) when the number of elements in the arrays were less than 30. In the pre-extending version, this rises to 500,000. With the OP's 'modest size' obviously being less than 30, my implementation is a non starter in either form.

And, for the OP's specific requirements, all the big-O analysis of algorithms in the world isn't going to change that. In the real world, big-O is no substitute for benchmarks of actual implementations using actual data. Which makes this whole subthread, like big-O notation, academic.

I also completely disagree with your assessment that my implementation is "a variation of a Bloom Filter". The key element of a bloom filter is that it is a probabalistic solution subject to false positive possibilities. My implementation does not suffer this limitation, it is completely determanistic. If you want to categorise it, then it is a brute force lookup table. Nothing more.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^5: Fast, Efficient Union and Intersection on arrays by BrowserUk
in thread Fast, Efficient Union and Intersection on arrays by barvin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.