in reply to Re^10: [OT] The interesting problem of comparing (long) bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.

I can't even see how you would adapt B-M to bit-string search.

That remains me of another of your questions. The trick is to consider that at every bit a new "byte" is introduced.

but building delta tables with 64-bit indices is obviously not on

From here to the end everything you say is mostly wrong. B-M for bit-strings can be implemented using a table of fixed size, that can comfortably fit in the L1 cache (needle size doesn't matter at all).

Even better, most of the time, all the work can be done on bytes, with very little bit-level fiddling.

In the worst scenario, the overhead over the brute-force approach would probably be a few machine instructions per haystack bit, on L1-cached data!

  • Comment on Re^11: [OT] The interesting problem of comparing (long) bit-strings.

Replies are listed 'Best First'.
Re^12: [OT] The interesting problem of comparing (long) bit-strings.
by BrowserUk (Patriarch) on Mar 31, 2015 at 13:07 UTC

    I can continue to explain my reasoning; and you can continue to state your beliefs till we're both blue in the face.

    Blah! Prove it!

      # $file $needle_bit_offset $needle_bit_length $repetitio +ns ./bitstrstr test.dat 100000000 2000 +10 needle found at 100000000, expected at 100000000 in 1.6/10 = 0.16ms

      Update:

      $ for i in 16 20 30 40 60 100 200 400 1000 3000 10000; do echo $i; ./b +itstrstr test.dat 100000000 $i 10; done 16 needle found at 164016, expected at 100000000 in 1.1/10 = 0.11ms 20 needle found at 949378, expected at 100000000 in 1.8/10 = 0.18ms 30 needle found at 100000000, expected at 100000000 in 1018.4/10 = 101.84 +ms 40 needle found at 100000000, expected at 100000000 in 38/10 = 3.8ms 60 needle found at 100000000, expected at 100000000 in 924.1/10 = 92.41ms 100 needle found at 100000000, expected at 100000000 in 12/10 = 1.2ms 200 needle found at 100000000, expected at 100000000 in 6.2/10 = 0.62ms 400 needle found at 100000000, expected at 100000000 in 3.7/10 = 0.37ms 1000 needle found at 100000000, expected at 100000000 in 2.3/10 = 0.23ms 3000 needle found at 100000000, expected at 100000000 in 0.9/10 = 0.09ms 10000 needle found at 100000000, expected at 100000000 in 0.4/10 = 0.04ms

        And what do the numbers look like if the offset is 99999997 and the length 2003?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked