in reply to Re^4: Exact string matching
in thread Exact string matching
Dear Monk,
I went through your post, that was a well proposed post for similarity search, i.e. finding the best match between given set of strings (Correct me if I am wrong). But here we are dealing with a single text or string and not more than that and the aim is to find out all possible repeats present in the given string (string is to be a sequence of DNA,the file that I refereed earlier in my post). The minimum length of the substring should be 3 and maximum should be n-1 (n=length of the string)
Here i have presented an example to get an idea of it.
eg) $text='AAATGAAAT'
Window or substring of length =3
set of all possible substrings={AAA,AAT,ATG,TGA,GAA,AAA,AAT}
AAA => Occurred 2 times at 0 and 5
AAT => Occurred 2 times at 1 and 6
no more repeats found
Window or substring of length =4
set of all possible substrings={AAAT,AATG,ATGA,TGAA,GAAA,AAAT}
AAAT => Occurred 2 times at 0 and 5
no more repeats found
Window or substring of length =5
set of all possible substrings={AAATG,AATGA,ATGAA,TGAAA,GAAAT}
no repeats found
and so on upto length of 9-1=8
because if there a case where our string looks like this $text='AAAAAAAAA'
then,
Window or substring of length =8
set of all possible substrings={AAAAAAAA,AAAAAAAA}
AAAAAAAA => Occurred 2 times at 0 and 1
so I have defined the maximum length of the substring to be n-1 (9-1=8) such that we dont miss any repeats of any length.
ya, as you had quoted, that what I had will never be a faster task and I agree with you, that why i need a faster algorithm or a faster alternative.
if I am not clear yet, please let me know.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Exact string matching
by GrandFather (Saint) on Oct 17, 2011 at 10:37 UTC | |
by saranrsm (Acolyte) on Oct 17, 2011 at 11:35 UTC | |
by choroba (Cardinal) on Oct 17, 2011 at 15:18 UTC |