in reply to Perl regular expression for amino acid sequence
You might have to consider each character separately, which leads to a long ugly string of alternations. The first char matches your character class. The second is either not a repeat, or is a repeat followed by not a repeat. The third is either not a repeat of the second, or a repeat followed by not a repeat./(?:(?!(.)\1\1)[QGYN]){3,6}/;
After that, the pattern is repeated for the 4th and 5th characters, but they're all optional and nested (so if you don't have the 4th char, you don't look for the 5th). The 6th char doesn't need to check for repetitions, because it was checked by the pattern for the 5th char.
Update: adjusted to fit OP's code snippet.while ($seq{$k} =~ /(([QGYN]) ((?!\2)[QGYN]|\2(?!\2)) ((?!\3)[QGYN]|\3(?!\3)) (?:((?!\4)[QGYN]|\4(?!\4)) (?:((?!\5)[QGYN]|\5(?!\5)) [QGYN]?)?)?) /xg) { print "\n$k"; print $1." begins at position ", (pos($seq{$k})-length($s)) , "\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Perl regular expression for amino acid sequence
by Roy Johnson (Monsignor) on Dec 01, 2004 at 21:45 UTC | |
by hv (Prior) on Dec 02, 2004 at 13:34 UTC | |
|
Re^2: Perl regular expression for amino acid sequence
by dragonchild (Archbishop) on Dec 01, 2004 at 20:47 UTC | |
|
Re^2: Perl regular expression for amino acid sequence
by ikegami (Patriarch) on Dec 01, 2004 at 21:11 UTC | |
|
Re^2: Perl regular expression for amino acid sequence
by seaver (Pilgrim) on Dec 01, 2004 at 20:27 UTC |