comment on

As BrowserUk pointed out here, there is no ! regex metacharacter/operator (presumably meaning "not-followed-by-the-following-pattern"). What is probably meant is (?!pattern) (see Look-Around Assertions in perlre; also see perlretut).

This may be what TamaDP was looking for (there are some other good solutions):

c:\@Work\Perl\monks>perl -wMstrict -le
"my @proteins = qw(
   DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG
   ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD
   AAAKAAA  AAAKAAA  XXXXXX
   );
 ;;
 my @new_peptides;
 for my $protein (@proteins) {
   if ($protein =~ s{ (?<= [KR]) (?! P) }{=}xmsg) {
     push @new_peptides, split ('=', $protein);
     }
   }
 ;;
 for my $peptide (@new_peptides) {
   print qq{Peptide is '$peptide'};
   }
"
Peptide is 'DAAAAATTLTTTAMTTTTTTCK'
Peptide is 'MMFRPPPPPGGGGGGGGGGGG'
Peptide is 'ALTAMCMNVWEITYHK'
Peptide is 'GSDVNR'
Peptide is 'R'
Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD'
Peptide is 'AAAK'
Peptide is 'AAA'
Peptide is 'AAAK'
Peptide is 'AAA'
[download]

Un-s///-ubstituted input proteins are not split and push-ed into the output array.

Yet another solution might be:

c:\@Work\Perl\monks>perl -wMstrict -le
"my @proteins = qw(
   DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG
   ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD
   AAAKAAA  AAAKAAA  XXXXXX
   );
 ;;
 my $cleave = qr{ (?<= [KR]) (?! P) }xms;
 ;;
 my @peptides =
   map  split($cleave),
   grep m{ $cleave }xms,
   @proteins
   ;
 ;;
 print qq{Peptide is '$_'} for @peptides;
"
Peptide is 'DAAAAATTLTTTAMTTTTTTCK'
Peptide is 'MMFRPPPPPGGGGGGGGGGGG'
Peptide is 'ALTAMCMNVWEITYHK'
Peptide is 'GSDVNR'
Peptide is 'R'
Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD'
Peptide is 'AAAK'
Peptide is 'AAA'
Peptide is 'AAAK'
Peptide is 'AAA'
[download]

in which the central process might be documented (with the steps of the statement being taken right-to-left, bottom-to-top) as:

my @peptides =           # 4. and the pieces are peptides.
  map  split($cleave),   # 3. split at each cleavage point...
  grep m{ $cleave }xms,  # 2. that can be cleaved, ...
  @proteins              # 1. for each protein...
  ;
[download]

Update: After sober consideration, I changed the code example above to its present form. The previous code contained
my $cleave = qr{ [KR] (?! P) }xms;
...
map split(m{ (?<= $cleave) }xms),
grep m{ $cleave }xms,
...
which doesn't really respect the DRY principle, which is what I was aiming to exemplify.

Give a man a fish: <%-{-{-{-<

In reply to Re^2: Bioinformatics: Regex loop, no output by AnomalousMonk
in thread Bioinformatics: Regex loop, no output by TamaDP

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.