Hi,
I'm checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract all common patterns on the same positions and report the number of occurrences.
For example in the following:
1.ACGCATTCA, 2.ACTGGATAC,
3.TCAGCCATC
I would like the following output (where a full stop represents any character):
(AC....T..) 2 occurrences -pattern between sequence 1 and 2.
(.C.G....C) 2 occurrences -pattern between sequence 2 and 3.
(.C.......) 2 occurrences -pattern between sequence 1 and 3.
As you can see, the way that I am planning on doing this now requires sum(n-1...1) comparisons. Is there a more efficient way of doing this with less comparisons? If there are any existing algorithms that you think may be better suited, please let me know.
Thanks