Re^3: Perl regular expression for amino acid sequence

Hi,

Actually, I think you could use two regexs here:

while ($seq{$k} =~ /([QGYN]{3,6})/g) {
     my $seq = $1;
     next if $seq =~ /(.)\1\1/;
     print "\n$k";
     print "$seq begins at position ", (pos($seq{$k})-length($s)) , "\
+n";
}
[download]

If this works for you, we could even optimize and consolidate this code a bit. I don't know where $s comes from, but I assume the lenght isn't changing any.

my $length = length $s; # Pull this out of the loop for eff.
my $sequence = $seq{$k};

while ($sequence =~ /([QGYN]{3,6})/g) {
     my $seq = $1;
     my $pos = $-[0] - $length; # @- holds the positions on the last m
+atch
     next if $seq =~ /(.)\1\1/;
     print "\n$k $seq begins at position $pos\n";
}
[download]

update: that was supposed to be print, not printf

Note that this is untested...

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

Comment on Re^3: Perl regular expression for amino acid sequence Select or Download Code

Replies are listed 'Best First'.
Re^4: Perl regular expression for amino acid sequence by Roy Johnson (Monsignor) on Dec 01, 2004 at 20:17 UTC
The problem with using two regexes is: if it matches and then gets rejected, your pos counter is still incremented. Say you match a string of 6 chars, QGYNNN. What you want from that is QGYNN (right, OP?), but what happens is that all six characters get tossed. That brings me to a flaw in my proposed solution: it will only give you QGYN from the above input. Needs some tweaking. Caution: Contents may have been coded under pressure.	[reply]

Replies are listed 'Best First'.

Re^4: Perl regular expression for amino acid sequence
by Roy Johnson (Monsignor) on Dec 01, 2004 at 20:17 UTC

That brings me to a flaw in my proposed solution: it will only give you QGYN from the above input. Needs some tweaking.

Caution: Contents may have been coded under pressure.

[reply]