Imagine the repeat sequence we are trying to detect is:

Abcdeabcdefabcdefg

So the input stream will look something like:

...AbcdeabcdefabcdefgAbcdeabcdefabcdefgAbcdeabcdefabcdefg...

but we may start sampling at any point. eg.

deabcdefabcdefgAbcdeabcdefabcdefgAbcdeabcdefabcdefgAbcdeabcdefabcd +efgAbcdeabcdefabcdefgAbcdeabcdefabcdefg 1 2 3 4 5 6

If we maintain rolling checksums, then:

But this would require us to store checksums covering every start-point and every end-point, for 2 full-but-disjoint repeats. That's N^2 checksums, which is more expensive than storing the data.

Unless there is some way to determine that we can discard those checksums that cannot be the one we want.

Ie. If when we read the 3rd value, and calculate the 2-value checksum for that value and its immediate predecessor and it doesn't match the checksum for the first two values, we know that our sequence must be longer than 2, and can discard both 2-value checksums.

Ditto, when we read the fourth value, and calculate the checksum for the last 3 values and it does not match that of the first 3, we can discard both.

But, when we calculated the 3-value checksum having read the fourth value, we needed the 2-value checksums; and when we come to calculate the 4-value checksums having read the 5 value, we need the 3-value checksums;

So the discarding of earlier checksums has to be deferred by one iteration cycle.


In reply to Re: Algorithm inspiration required. (Further thoughts on rolling checksums) by BrowserUk
in thread Algorithm inspiration required. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.