It required one small but important modification to your algorithm; that was a bitch to track down.

The table calculation: table[ idx ] = needleLen - 8 + needleOffset; can result in a value of 0 (ie. 24 - 8 - 16 = 0 ), which means that the algorithm stalls.

My current fix is to simply test for and replace any 0 table entries with 1 so it keeps moving forward. (I suspect that there is a better fix, but its not leaping off the page at me; and its time to sleep.)

The upshot is that a brute force search that takes 9.72 seconds (a 220 bits needle in a 231 bits haystack, found at an offset of 2e9) takes:

  1. 8-bit table: 2.807604e-002 secs.
  2. 12-bit table: 7.613071e-003 secs.
  3. 16-bit table: 4.428185e-003 secs.

Over 3 orders of magnitude speed up is all together a very successful outcome. Thank you anonymonk.

Thanks also to salva & oiskuu who's persistent belief that a fast-bit-search was possible, led me to start this thread and get this solution.

Here is a table of timings; bitstrstr1 is the brute-force; 3 is the 8-bit; 4 is the 16-bit; and 5 the 12-bit.

The first entry (-1) is a full search for a non-existent needle:

C:\test\C>bitstrstr "DeBruijn(2^31).bin" 1048576 1 250000000 v:00000000008E0040 fsize:0000000010000004 vsize:2000000 bitstrstr1: 0 : -1 took:1.030802e+001 secs bitstrstr1: 250000000 : 250000000 took:1.274025e+000 secs bitstrstr1: 500000000 : 500000000 took:2.420335e+000 secs bitstrstr1: 750000000 : 750000000 took:3.604043e+000 secs bitstrstr1: 1000000000 : 1000000000 took:4.826112e+000 secs bitstrstr1: 1250000000 : 1250000000 took:6.205683e+000 secs bitstrstr1: 1500000000 : 1500000000 took:7.483219e+000 secs bitstrstr1: 1750000000 : 1750000000 took:8.526418e+000 secs bitstrstr1: 2000000000 : 2000000000 took:9.723123e+000 secs bitstrstr3: 0 : -1 took:1.019202e-001 secs bitstrstr3: 250000000 : 250000000 took:3.729577e-002 secs bitstrstr3: 500000000 : 500000000 took:3.556026e-002 secs bitstrstr3: 750000000 : 750000000 took:1.154790e-002 secs bitstrstr3: 1000000000 : 1000000000 took:1.110663e-002 secs bitstrstr3: 1250000000 : 1250000000 took:1.219347e-002 secs bitstrstr3: 1500000000 : 1500000000 took:1.436291e-002 secs bitstrstr3: 1750000000 : 1750000000 took:2.335553e-002 secs bitstrstr3: 2000000000 : 2000000000 took:2.807604e-002 secs bitstrstr4: 0 : -1 took:8.661798e-003 secs bitstrstr4: 250000000 : 250000000 took:4.236888e-003 secs bitstrstr4: 500000000 : 500000000 took:4.733442e-003 secs bitstrstr4: 750000000 : 750000000 took:4.514161e-003 secs bitstrstr4: 1000000000 : 1000000000 took:4.211377e-003 secs bitstrstr4: 1250000000 : 1250000000 took:4.328529e-003 secs bitstrstr4: 1500000000 : 1500000000 took:4.249651e-003 secs bitstrstr4: 1750000000 : 1750000000 took:4.525668e-003 secs bitstrstr4: 2000000000 : 2000000000 took:4.428185e-003 secs bitstrstr5: 0 : -1 took:2.952949e-002 secs bitstrstr5: 250000000 : 250000000 took:1.516276e-002 secs bitstrstr5: 500000000 : 500000000 took:7.000594e-003 secs bitstrstr5: 750000000 : 750000000 took:6.885374e-003 secs bitstrstr5: 1000000000 : 1000000000 took:6.434349e-003 secs bitstrstr5: 1250000000 : 1250000000 took:6.177053e-003 secs bitstrstr5: 1500000000 : 1500000000 took:6.085954e-003 secs bitstrstr5: 1750000000 : 1750000000 took:8.395146e-003 secs bitstrstr5: 2000000000 : 2000000000 took:7.613071e-003 secs

The code needs a little clean-up, but I'll post it if anyone is interested?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

In reply to Re^8: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (Thank you Anonymonk + others.) by BrowserUk
in thread Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?) by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.