Re: Perl regular expression for amino acid sequence

Here's one way, which needs a slightly convoluted way of figuring out the original positiion:

  # break up three character repeats, inserting spaces
  while ($seq{$k} =~ s/([QGYN])\1\1/$1$1 $1$1/g) { }

  while ($seq{$k} =~ m/([QGYN]{3,6})/g) {
    print "Match: $1 at ", pos($seq{$k})
      -  length($1)-2*(substr($seq{$k}, 0, pos($seq{$k})) =~ tr/ / /),
+ "\n";
  }
[download]

If you already have spaces in your sequences, you'd have to use some other character.

Updated: Changed 5 to 6. I thought the original had a "5", but it was just the tiny fonts on my monitor.

Comment on Re: Perl regular expression for amino acid sequence Download Code

Replies are listed 'Best First'.
Re^2: Perl regular expression for amino acid sequence by ikegami (Patriarch) on Dec 01, 2004 at 21:21 UTC
`>perl script.pl Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 Match: GNN at 7 ...` [download] It seems my Perl's tr/// clears pos for all strings. Workaround: `use strict; use warnings; my %seq; my $k = 0; $seq{$k} = 'xxxxxxxGNNNxxxxxxxNNNGYGYxxxxxxxGYGYNNNxxxxxxxNNNGNNNxxxxx +xx'; # break up three character repeats, inserting spaces while ($seq{$k} =~ s/([QGYN])\1\1/$1$1 $1$1/g) { } while ($seq{$k} =~ m/([QGYN]{3,5})/g) { my $saved_pos = pos($seq{$k}); printf("Match: %s at %d\n", $1, pos($seq{$k}) - length($1)-2*(substr($seq{$k}, 0, pos($seq{$k})) + =~ tr/ / /), ); pos($seq{$k}) = $saved_pos; }` [download] Finally, a solution that works!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Perl regular expression for amino acid sequence
by ikegami (Patriarch) on Dec 01, 2004 at 21:21 UTC

>perl script.pl
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
Match: GNN at 7
...
[download]

It seems my Perl's tr/// clears pos for all strings. Workaround:

use strict;
use warnings;

my %seq;
my $k = 0;

$seq{$k} = 'xxxxxxxGNNNxxxxxxxNNNGYGYxxxxxxxGYGYNNNxxxxxxxNNNGNNNxxxxx
+xx';

# break up three character repeats, inserting spaces
while ($seq{$k} =~ s/([QGYN])\1\1/$1$1 $1$1/g) { }

while ($seq{$k} =~ m/([QGYN]{3,5})/g) {
   my $saved_pos = pos($seq{$k});
   printf("Match: %s at %d\n", 
      $1,
      pos($seq{$k}) - length($1)-2*(substr($seq{$k}, 0, pos($seq{$k}))
+ =~ tr/ / /),
   );
   pos($seq{$k}) = $saved_pos;
}
[download]

Finally, a solution that works!

[reply]
[d/l]
[select]