in reply to Re^7: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
in thread Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
It required one small but important modification to your algorithm; that was a bitch to track down.
The table calculation: table[ idx ] = needleLen - 8 + needleOffset; can result in a value of 0 (ie. 24 - 8 - 16 = 0 ), which means that the algorithm stalls.
My current fix is to simply test for and replace any 0 table entries with 1 so it keeps moving forward. (I suspect that there is a better fix, but its not leaping off the page at me; and its time to sleep.)
The upshot is that a brute force search that takes 9.72 seconds (a 220 bits needle in a 231 bits haystack, found at an offset of 2e9) takes:
Over 3 orders of magnitude speed up is all together a very successful outcome. Thank you anonymonk.
Thanks also to salva & oiskuu who's persistent belief that a fast-bit-search was possible, led me to start this thread and get this solution.
Here is a table of timings; bitstrstr1 is the brute-force; 3 is the 8-bit; 4 is the 16-bit; and 5 the 12-bit.
The first entry (-1) is a full search for a non-existent needle:
C:\test\C>bitstrstr "DeBruijn(2^31).bin" 1048576 1 250000000 v:00000000008E0040 fsize:0000000010000004 vsize:2000000 bitstrstr1: 0 : -1 took:1.030802e+001 secs bitstrstr1: 250000000 : 250000000 took:1.274025e+000 secs bitstrstr1: 500000000 : 500000000 took:2.420335e+000 secs bitstrstr1: 750000000 : 750000000 took:3.604043e+000 secs bitstrstr1: 1000000000 : 1000000000 took:4.826112e+000 secs bitstrstr1: 1250000000 : 1250000000 took:6.205683e+000 secs bitstrstr1: 1500000000 : 1500000000 took:7.483219e+000 secs bitstrstr1: 1750000000 : 1750000000 took:8.526418e+000 secs bitstrstr1: 2000000000 : 2000000000 took:9.723123e+000 secs bitstrstr3: 0 : -1 took:1.019202e-001 secs bitstrstr3: 250000000 : 250000000 took:3.729577e-002 secs bitstrstr3: 500000000 : 500000000 took:3.556026e-002 secs bitstrstr3: 750000000 : 750000000 took:1.154790e-002 secs bitstrstr3: 1000000000 : 1000000000 took:1.110663e-002 secs bitstrstr3: 1250000000 : 1250000000 took:1.219347e-002 secs bitstrstr3: 1500000000 : 1500000000 took:1.436291e-002 secs bitstrstr3: 1750000000 : 1750000000 took:2.335553e-002 secs bitstrstr3: 2000000000 : 2000000000 took:2.807604e-002 secs bitstrstr4: 0 : -1 took:8.661798e-003 secs bitstrstr4: 250000000 : 250000000 took:4.236888e-003 secs bitstrstr4: 500000000 : 500000000 took:4.733442e-003 secs bitstrstr4: 750000000 : 750000000 took:4.514161e-003 secs bitstrstr4: 1000000000 : 1000000000 took:4.211377e-003 secs bitstrstr4: 1250000000 : 1250000000 took:4.328529e-003 secs bitstrstr4: 1500000000 : 1500000000 took:4.249651e-003 secs bitstrstr4: 1750000000 : 1750000000 took:4.525668e-003 secs bitstrstr4: 2000000000 : 2000000000 took:4.428185e-003 secs bitstrstr5: 0 : -1 took:2.952949e-002 secs bitstrstr5: 250000000 : 250000000 took:1.516276e-002 secs bitstrstr5: 500000000 : 500000000 took:7.000594e-003 secs bitstrstr5: 750000000 : 750000000 took:6.885374e-003 secs bitstrstr5: 1000000000 : 1000000000 took:6.434349e-003 secs bitstrstr5: 1250000000 : 1250000000 took:6.177053e-003 secs bitstrstr5: 1500000000 : 1500000000 took:6.085954e-003 secs bitstrstr5: 1750000000 : 1750000000 took:8.395146e-003 secs bitstrstr5: 2000000000 : 2000000000 took:7.613071e-003 secs
The code needs a little clean-up, but I'll post it if anyone is interested?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^9: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (Thank you Anonymonk + others.)
by Anonymous Monk on Apr 08, 2015 at 17:14 UTC | |
by BrowserUk (Patriarch) on Apr 08, 2015 at 17:43 UTC |