in reply to Re: Gay Dating; or, Thrifty Set Intersections with an Accumulator
in thread Gay Dating; or, Thrifty Set Intersections with an Accumulator

Big O notation is a little woolly all by itself. I think you know this, so I won't drive the point into the ground. For some value of "faster" (where faster is not better), an O(N2)-bounded algorithm grows faster than an O(N)-bounded one. That doesn't ensure the latter is always preferred; it only indicates.

I stated a range to exclude the possibility of better (for some value of "better") algorithms than O(N) (or worse than O(N2)). This is not a sophisticated or advanced node. This is a pretty basic introduction to thinking about how to attack a problem set.

As stated clearly in my FINAL WARNING, the script shown is not any sort of solution to the bioinformatics exercise, here only hinted at. It's a demonstration of technique, only loosely related to the preceding essay. There are only ten simple records; almost anything would work. If we wait, somebody may golf it down to a one-liner.

The difference between my code and your code is that, as the data set gets larger and the problem set more complex, my editing may go a bit more smoothly. I don't doubt that you'll wind up with a really efficient script; you're good. But as the stress piles on, I'll accommodate it; my pieces are flexible and modular (in the wider sense). To put it another way, I would fear to maintain your code, while you would only hate to maintain mine.

Note: Contents may change.
  • Comment on Re^2: Gay Dating; or, Thrifty Set Intersections with an Accumulator
  • Download Code

Replies are listed 'Best First'.
Re^3: Gay Dating; or, Thrifty Set Intersections with an Accumulator
by JavaFan (Canon) on Aug 11, 2010 at 18:41 UTC
    For some value of "faster" (where faster is not better), an O(N2)-bounded algorithm grows faster than an O(N)-bounded one.
    This shows you don't know much about big-Oh.

    Any algorithm whose running time is in "O(N)" (for N towards infinity) is also in "O(N2)".

    And big-Oh notation is not "woolly". In fact, if the running time of an algorithm is expressed as T(N), with N the size of the input, and it's said that T(N) is in O(f(N)), it means that:

        ∃ M, c > 0: ∀ m > M: T(m) <= c f(m)
    
    The difference between my code and your code is that, as the data set gets larger and the problem set more complex, my editing may go a bit more smoothly.
    And you base this statement on what? (And what has editing to do with the size of the data set?)
    To put it another way, I would fear to maintain your code, while you would only hate to maintain mine.
    I don't know what to think of a programmer that fears to maintain code that consists of two small, self contained blocks, and a small subroutine.