in reply to Re: Bioinformatics: Regex loop, no output
in thread Bioinformatics: Regex loop, no output

The output from your code shows some problems:

The peptide is DAAAAATTLTTTAMTTTTTTC The peptide is MMFRPPPPPGGGGGGGGGGGG The peptide is ALTAMCMNVWEITYH The peptide is GSDVN The peptide is The peptide is ASFAQPPPQPPPPLLAIKPASDASD
The K or R terminating split codon (if that's the proper term) is being incorrectly removed from the output peptides. (At least, I think this is incorrect. TamaDP doesn't show desired output, but seems satisfied with output examples given in various replies in this thread that include these codons.) So I assume  GSDVN should really be  GSDVNR and the "null" sequence following it should really be the single-codon sequence R. This is all down to the incorrect definition of the  s/// match pattern; take a look at some other replies in this thread for what I feel are more correct  s/// patterns.

In an unrelated note, the regex in the condition expression of the
    if ($protein =~ m/[K(?!P)|R(?!P)]/g) { ... }
block isn't doing what I think you think it's doing. The  [K(?!P)|R(?!P)] character class is exactly equivalent to the  [KPR()?!|] class; metacharacters (alternations, groupings, etc.) have no meaning in a character class, so  ()?!| are just literal characters (and repeated characters have no effect whatsoever). Also, the  /g modifier in the  m//g match is useless in the boolean context of a conditional, although it does no harm (except to burn a few more innocent computrons). Again, all this doesn't affect the basic problem with the code, which stems from the incorrect  s/// match.

I use Data::Dumper all the time because I've been fooled by my data too many times.

Yea and amen brother, yea and amen.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^3: Bioinformatics: Regex loop, no output
by tonto (Friar) on Nov 17, 2015 at 21:05 UTC

    Thank you! I wondered if I understood what was wanted, later posts show that I didn't. I shouldn't have posted that, I'll stop myself next time.