in reply to Understanding a script that looks for mismatches to a regex
First off, congratulations on getting the onlyIDleft.
Secondly, I think the changes you made in sub make_approximate() are the appropriate changes to make to convert it from nucleotides to amino acids.
Fourthly, I think the problem arises when you use square brackets in your original motif for matching. I ran it using the following:
my $motif = [IFVL][RK][AGST]; my $answer = make_approximate($motif, 1); print $answer, \n;
output: [(?:I(?:F(?:V(?:L][(?:R(?:K][(?:A(?:G(?:S(?:T]|.])|.T])|.ST])|.GST])|.][AGST])|.K][AGST])|.][RK][AGST])|.L][RK][AGST])|.VL][RK][AGST])|.FVL][RK][AGST])
Since [IFVL] matches a single character of I or F or V or L, if you use square brackets[IFVL], the subroutine breaks this unit apart. Instead of the output we get above, we would have wanted something like this:
(?:[IFVL](?:[RK].|.[AGST])|.[RK][AGST])
So, if you want to use square brackets, you will need to incorporate a step that checks for opening and closing of square brackets and acts differently if inside brackets. Alternatively, you could substitute in placeholders for groupings like [IFVL]. This might be better since you could have a hash key for polar amino acids, small and large hydrophobic, etc. Anyway, I’ll leave that for you to think about.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Understanding a script that looks for mismatches to a regex
by onlyIDleft (Scribe) on Jul 08, 2012 at 05:11 UTC |