in reply to Bioinformatics: Regex loop, no output

Using the power of map:

#! /usr/bin/perl -wl my @proteins = qw(DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMN +VWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD); my %seen = map {$_ => 1} @proteins; print "Peptide $_" for grep !$seen{$_}++, map {split /[KR]\K(?!P)/} @proteins;

Replies are listed 'Best First'.
Re^2: Bioinformatics: Regex loop, no output
by AnomalousMonk (Archbishop) on Nov 15, 2015 at 23:33 UTC

    Something I don't understand here. The "uniquifying" action of the  %seen hash acts to prevent the passage of un-split, whole, original proteins (like  AAAAAA in the example below) from getting through the dataflow "pipe" into the output, but it also prevents duplicated split pieces (e.g.,  AAAAK AAAA below) from the input from being output. Is this bioinformatically useful?

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @proteins = qw(AAAAKAAAA AAAAKAAAA AAAAAA); ;; my %seen = map {$_ => 1} @proteins; ;; print qq{Peptide '$_'} for grep !$seen{$_}++, map {split /[KR]\K(?!P)/} @proteins; ;; dd \%seen; " Peptide 'AAAAK' Peptide 'AAAA' { AAAA => 2, AAAAAA => 2, AAAAK => 2, AAAAKAAAA => 1 }


    Give a man a fish:  <%-{-{-{-<