I see check summing works quite well. I had expected more false positives because different sequences can lead to the same checksum.
The algorithm is probably being flattered by my test data -- 16-bit random values. Although I am seeing a few false positives, it doesn't take long to verify them using a value-by-value comparison of a relatively small (currently 100, but that's flexible) number of actual values. With 100 x 16-bit numbers, there are huge number of possibilties (1e65536), so finding a repeat of those 100 is a pretty strong indicator of having found the start of the next repeat.
I realise with bio data, especially if its just ACGT, you'd need use a much longer tell-tale sequence to avoid many false positives.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] |