in reply to Re^5: [OT] The interesting problem of comparing (long) bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.

This node was taken out by the NodeReaper on Apr 26, 2015 at 09:21 UTC

Replies are listed 'Best First'.
Re^7: [OT] The interesting problem of comparing (long) bit-strings.
by salva (Canon) on Mar 31, 2015 at 10:12 UTC
    For flat arrays, in order to insert an element in an array of N elements, you have to perform two operations: 1) find the insertion point which is an O(N) operation, and 2) insert the new value there which is also O(N) because you have to move all the elements after the insertion point by one place. Globally the operation is O(N).

    With lists, step 2, is cheaper, but you still have to find the insertion point in the list which is O(N), so globally, the operation remains O(N).

    Lists are cache unfriendly by two reasons: 1) they use more memory than arrays (for built-in types at least x2, but x4 or x8 commonly) and 2) they may be scattered in memory and turn the cache prefetching useless. So it is easy to get into a situation where advancing to the next element requires always going to L3 or even RAM.

    In contrast, navigating an array, even when it doesn't fit into L2, is much faster because the cache prefetching is fully effective.

    BTW, You get O(logN) insertions when you use a tree.

      I take it that Boyer Moore was a bust then?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        How is your data?

        If it is mostly random, without repeated patters (for instance, most bits being 0), and long needles, B-M can potentially be several orders of magnitude faster than the brute-force approach.

        On the bad data cases, B-M would just become equivalent to the brute-force algorithm. I don't think it would introduce too much overhead.