in reply to Re: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
in thread Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

The point though is that instead of having a table of shifts required for a given byte, you can have a table of shifts required for the position of the rightmost mismatch.

The right mismatching what? Single bit? Group of 8 bits? Or 13 bits? 1000001 bits?

My point is that a table of single bits 0 or 1 doesn't help; but any larger than that, and you're into the "groups of aligned bits" problem I described above.

Whatever alignment you choose, byte/word/other; and whatever group of bits from the mismatched position in the haystack you use as your index, shift the needle 1-bit either way from that alignment and that group is no longer relevant.

I don't know how to better describe it; but until you've tried to implement what you are describing -- using bits; not bytes pretending to be bits -- you will not appreciate the problem.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^2: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

Replies are listed 'Best First'.
Re^3: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
by Anonymous Monk on Apr 06, 2015 at 00:37 UTC
    How about making the bits pretend to be bytes? =)
      How about making the bits pretend to be bytes?

      Apart from the fact that a 1GB binary bitstring that fits comfortably in memory would suddenly occupy 8GB and push my machine into swapping; what makes you think searching 8GB of ascii-ized binary would be faster than searching 1GB of binary binary?

      Okay, so you'd be able to use one or other of the fast string search algorithms, but the trouble with that is, with only 2 x 8-bit patterns involved -- ie. a 2 symbol alphabet -- they simply no longer provide an benefit. So your back to using a brute-force search algorithm; but on 8 x the volume of data.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        I mean literally convert the bits to bytes. The stream of 8 bits -> 1 bytes. Then take the needles and convert them to bytes (as well as shifting the needle bits out to cover the unaligned issue) 8 gb bitstring to 1 gb of bytes string. needle
        11101110 01101001 to 11011100 11010010 11011100 11010011 10111001 10100110 10111001 10100111 ... x 8 bit variations. Convert them to bytes and then use one of the string efficient algos. + Matches on the needle will allow you to either know 100% (if its on +the non shifted bytes) or at least that there is a possible match on +the shifted bytes which can be verified at the bit level for the full + match.