BrowserUk:

Are you familiar with the Boyer-Moore string search? I bet you could use the same basic idea of building a state machine of the next address to check. But rather than analyzing the needle byte by byte, do it in parallel for each shifted variant. Something like the below. Note: I'm going to pretend the byte size is 4 bits here, and I also don't know boyer-moore by heart, so I'm winging it to give you the gist of it. If you like the idea, you can hash out the annoying details (heh!):

Given Needle : 1010 0011 1101 1100 Haystack: 0001 0010 1101 1100 1010 1101 0001 1110 1110 0111 1) Make three more copies of given, each bit-shifted Needle A: 1010 0011 1101 1100 Needle B: x101 0001 1110 1110 0xxx Needle C: xx10 1000 1111 0111 00xx Needle D: xxx1 0100 0111 1011 100x Column D C B A 2) Build the BM skip table based on the last complete byte D C B A 0000 ---- ---- ---- 4444 0001 ---W -M-- ---- ---- 0010 --X- ---- ---- ---- 0011 ---W M--- ---- ---- 0100 ---- ---M ---- ---- 0101 -Y-W ---- ---- ---- 0110 --X- ---- ---- ---- 0111 ---W ---- ---M --M- 1000 ---- --M- ---- ---- 1001 ---W ---- ---- ---- 1010 Z-X- ---- ---- ---- 1011 ---W ---- ---- ---M 1100 ---- ---- ---- M--- 1101 -Y-W ---- M--- ---- 1110 --X- ---- -M-- -M-- 1111 ---W ---- --M- ---- - : an entry I didn't try to determine yet M : Matched byte position, match previous needle char to previous table column. Z : Perfect match found Y : Perfect match if byte after position A matches remnant B X : Perfect match if byte after position A matches remnant C W : Perfect match if byte after position A matches remnant D 1-4: mismatch, skip 1..4 bytes and restart scan at position A

Geh! I've got to get to work now. I'll update this at lunch time. Anyway, the state table will show how far you can skip ahead so you don't need to examine every byte in the haystack. Read the [no such wiki, Boyer-Moor string search] page, and you'll see what I'm talking about. I've drawn four state machines here by having 4 output edges based on which needle we're tracking, but in the final version, we'd combine the tables into simpler entries. If you like the idea, we can investigate that a bit. I've got some time this evening, so we could code something up.

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re: [OT] The interesting problem of comparing bit-strings. by roboticus
in thread [OT] The interesting problem of comparing bit-strings. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.