Dear Monk,
I went through your post, that was a well proposed post for similarity search, i.e. finding the best match between given set of strings (Correct me if I am wrong). But here we are dealing with a single text or string and not more than that and the aim is to find out all possible repeats present in the given string (string is to be a sequence of DNA,the file that I refereed earlier in my post). The minimum length of the substring should be 3 and maximum should be n-1 (n=length of the string)
Here i have presented an example to get an idea of it.
eg) $text='AAATGAAAT'
Window or substring of length =3
set of all possible substrings={AAA,AAT,ATG,TGA,GAA,AAA,AAT}
AAA => Occurred 2 times at 0 and 5
AAT => Occurred 2 times at 1 and 6
no more repeats found
Window or substring of length =4
set of all possible substrings={AAAT,AATG,ATGA,TGAA,GAAA,AAAT}
AAAT => Occurred 2 times at 0 and 5
no more repeats found
Window or substring of length =5
set of all possible substrings={AAATG,AATGA,ATGAA,TGAAA,GAAAT}
no repeats found
and so on upto length of 9-1=8
because if there a case where our string looks like this $text='AAAAAAAAA'
then,
Window or substring of length =8
set of all possible substrings={AAAAAAAA,AAAAAAAA}
AAAAAAAA => Occurred 2 times at 0 and 1
so I have defined the maximum length of the substring to be n-1 (9-1=8) such that we dont miss any repeats of any length.
ya, as you had quoted, that what I had will never be a faster task and I agree with you, that why i need a faster algorithm or a faster alternative.
if I am not clear yet, please let me know.
In reply to Re^5: Exact string matching
by saranrsm
in thread Exact string matching
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |