in reply to Re^3: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
in thread Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

How about making the bits pretend to be bytes?

Apart from the fact that a 1GB binary bitstring that fits comfortably in memory would suddenly occupy 8GB and push my machine into swapping; what makes you think searching 8GB of ascii-ized binary would be faster than searching 1GB of binary binary?

Okay, so you'd be able to use one or other of the fast string search algorithms, but the trouble with that is, with only 2 x 8-bit patterns involved -- ie. a 2 symbol alphabet -- they simply no longer provide an benefit. So your back to using a brute-force search algorithm; but on 8 x the volume of data.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^4: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

Replies are listed 'Best First'.
Re^5: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
by Anonymous Monk on Apr 06, 2015 at 06:30 UTC
    I mean literally convert the bits to bytes. The stream of 8 bits -> 1 bytes. Then take the needles and convert them to bytes (as well as shifting the needle bits out to cover the unaligned issue) 8 gb bitstring to 1 gb of bytes string. needle
    11101110 01101001 to 11011100 11010010 11011100 11010011 10111001 10100110 10111001 10100111 ... x 8 bit variations. Convert them to bytes and then use one of the string efficient algos. + Matches on the needle will allow you to either know 100% (if its on +the non shifted bytes) or at least that there is a possible match on +the shifted bytes which can be verified at the bit level for the full + match.
      the shift is basically giving you string needles that represent all of the alignments possible.