ok, I see check summing works quite well. I had expected more false positives because different sequences can lead to the same checksum.
The suffix array method (from bio informatics) described above OTOH is exact, but admittedly more difficult to implement efficiently.
And if you periodically need to throw out old values and feed new values in, that is a requirement for the algorithm that is probably not yet implemented.