The following uses look behind (?<=...) with a match set [KR]and negative look ahead (?!P) that rejects a "following P" match in a split to slice up the protein:
use strict; use warnings; my @proteins = qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD DAAAAATTLTTTAMTTTTTTCK ); for my $protein (@proteins) { my @peptides = split /(?<=[KR])(?!P)/, $protein; next if @peptides < 2; print "Protein: $protein\n"; print "Peptides:\n"; print " $_\n" for @peptides; }
Prints:
Protein: DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG Peptides: DAAAAATTLTTTAMTTTTTTCK MMFRPPPPPGGGGGGGGGGGG Protein: ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD Peptides: ALTAMCMNVWEITYHK GSDVNR R ASFAQPPPQPPPPLLAIKPASDASD
In reply to Re: Bioinformatics: Regex loop, no output
by GrandFather
in thread Bioinformatics: Regex loop, no output
by TamaDP
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |