in reply to Re^2: Perl regular expression for amino acid sequence
in thread Perl regular expression for amino acid sequence

This solution is actually fairly wrong since it first attempts to take from the front instead of trying to shorten the match. Of course, this is if QGNNNG would be considered series of two valid amino acids, being QGN and NNG.

my $cur; while ($seq{$k} =~ /([QGYN]{3,6})/g) { $cur = $1; pos($seq{$k}) -= length($cur) - 1 and next if $cur =~ /(.)\1\1/; print "\n$k"; print $cur." begins at position ", (pos($seq{$k})-length($s)) , "\n +"; }

Replies are listed 'Best First'.
Re^4: Perl regular expression for amino acid sequence
by Roy Johnson (Monsignor) on Dec 01, 2004 at 20:53 UTC
    The fix is something like:
    my $cur; while ($seq{$k} =~ /([QGYN]{3,6})/g) { $cur = $1; pos($seq{$k}) -= length($cur); $cur =~ s/(.)\1\1.*/$1$1/; if (length($cur) >= 3) { pos($seq{$k}) += length($cur); } else { ++pos($seq{$k}); next } print "\n$k"; print $cur." begins at position ", (pos($seq{$k})-length($s)) , "\n +"; }

    Caution: Contents may have been coded under pressure.