in reply to Re^2: Perl regular expression for amino acid sequence
in thread Perl regular expression for amino acid sequence

Hi,

Actually, I think you could use two regexs here:

while ($seq{$k} =~ /([QGYN]{3,6})/g) { my $seq = $1; next if $seq =~ /(.)\1\1/; print "\n$k"; print "$seq begins at position ", (pos($seq{$k})-length($s)) , "\ +n"; }

If this works for you, we could even optimize and consolidate this code a bit. I don't know where $s comes from, but I assume the lenght isn't changing any.

my $length = length $s; # Pull this out of the loop for eff. my $sequence = $seq{$k}; while ($sequence =~ /([QGYN]{3,6})/g) { my $seq = $1; my $pos = $-[0] - $length; # @- holds the positions on the last m +atch next if $seq =~ /(.)\1\1/; print "\n$k $seq begins at position $pos\n"; }

update: that was supposed to be print, not printf

Note that this is untested...

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

Replies are listed 'Best First'.
Re^4: Perl regular expression for amino acid sequence
by Roy Johnson (Monsignor) on Dec 01, 2004 at 20:17 UTC
    The problem with using two regexes is: if it matches and then gets rejected, your pos counter is still incremented. Say you match a string of 6 chars, QGYNNN. What you want from that is QGYNN (right, OP?), but what happens is that all six characters get tossed.

    That brings me to a flaw in my proposed solution: it will only give you QGYN from the above input. Needs some tweaking.


    Caution: Contents may have been coded under pressure.