The basic premise of (the titled) fast string matching algorithms, is that you preprocess the needle and construct a table of shifts or skips that allow you to skip over chunks of the haystack and thus speed things up.
The following, I hope clear, explanation (they are few and far between) describes a variant called Quick Search.
You start with a haystack and needle:
0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010
You inspect the needle and build the skip table:
00001010 01010100 00011010 00001010 shift 24 01010100 shift 16 00011010 shift 8 xxxxxxxx shift 32 (all other patterns not found in the needle allow ma +ximum shift)
And apply it. Try the pattern at the start of the haystack; doesn't match, so look up the next 8 bits of the haystack in the table, and find the shift = 32:
000011101001110010111011 01110001 111010111100111110111111100000001000 +100100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010 ???????? shift = 32
So, apply the shift and try the needle at the new position. No match, lookup the next 8 bits of the haystack to find a shift of 32:
0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010100101010000011010???????? shift + = 32
Apply the shift, try the needle at the new position. No match, look up the next 8 bits, get the shift of 8:
0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 000010 +100101010000011010???????? shift = 8
Apply the shift, try the match, And success. Needle found.
0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 + 000010100101010000011010 Found at 72
Now lets try that with the same haystack but another needle:
10000101 00101010 00001101 10000101 shift 24 00101010 shift 16 00001101 shift 8 xxxxxxxx shift 32 0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 100001010010101000001101???????? shift 32 100001010010101000001101???????? shift + 32 100001 +010010101000001101???????? shift 32 + 100001010010101000001101???????? shift 32 + 10000101001 +0101000001101 >>> Not found.
Great! Four compares & four skips to discover the needle isn't in the haystack.
0000111010011100101110110111000111101011110011111011111110000000100010 +0100001010010101000001101000110010011000101101010110011011000000 + 100001010010101000001101
And that's why Boyer-Moore, Horspool, Alpha-Skip, Quick Search et. al. don't work for bit-strings.
It doesn't matter what size you make the chunks of bits -- above I used 8 -- the skip table will only ever consider aligned groups of bits, but bit-strings inherently consist entirely of unaligned bits!
(Don't believe me; just try it for yourself.)
To prove my instincts were right?
Maybe, but mostly because I wanted to be wrong. I wanted there to be a better than brute force mechanism. And I'm still hoping that someone will point me at one.
In reply to Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?) by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |