Hi, grizzley, What I mean "fuzzy match" is that "Not a perfect match". For example, if I have a forward primer: "agct" and I want to find it in the following sequences and trim off the regions before it:
>seq1
aaagctcccc
>seq2
aaacctgggg
if I want to perform a "Perfect match" search and trim, only seq1 contains "agct", after the trim, seq1 becomes "agctcccc", since I want to keep the primer in the sequence. we will not be able to find a perfect match to "agct" in seq2,therefore, it is untouched.
For a "fuzzy match" search, if I allow up to 1 or 2 mismatch for "agct", the both seq1 and seq2 would trimed. seq1 contains "agct", and seq2 contains "acct" which containing 1 mistmatch by substitute "g" to "c", so after the trim, it supposed to be:
>seq1
agctcccc
>seq2
acctgggg
but since there might be many different combinations, for 1 mismatch to "agct", it could be "acct", "ggct" etc, "Bio::Grep" can do such "fuzzy match" search, but only output sequences that contain such regions. I think it does not perform the trimming as downstream process. I do not know how to using perl to write a program to do so? I would really appreciate your help.
| [reply] |
In that case there are at least two possibilities:
1. If Bio::Perl can do match with wildcards, you can do fuzzy match 'agct.*'
2. Do fuzzy match with Bio::Perl and use returned matched string to do perfect match or better substitute: s/.*?(?=$returnedstring)//
| [reply] [d/l] |
| [reply] |