Again, GrandFather's code above already seems to print the information in essentially the way you want, except the formatting is different. So change the print formatting. Is this what you need help with?

On the other hand, you may mean that you want the peptides encapsulated into an independent data structure that you can pass around to any function at will. Here's an adaptation of GrandFather's code to produce a data structure associating proteins with their split peptides:

c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my @proteins = qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD DAAAAATTLTTTAMTTTTTTCK XXXXXXX ); ;; my %protein_peptides; ;; for my $protein (@proteins) { my @peptides = split /(?<=[KR])(?!P)/, $protein; ;; next if @peptides < 2; ;; push @{ $protein_peptides{$protein} }, \@peptides } ;; dd \%protein_peptides; " { ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD => [ ["ALTAMCMNVWEITYHK", "GSDVNR", "R", "ASFAQPPPQPPPPLLAIKPASDASD"], ], DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG => [ ["DAAAAATTLTTTAMTTTTTTCK", "MMFRPPPPPGGGGGGGGGGGG"] ], }
I have reformatted the native output of Data::Dump::dd() as it appeared on my monitor to make it more readable. (Update: I like Data::Dump as my dumper, but you may prefer Data::Dumper, which is core.)

Note that the protein  DAAAAATTLTTTAMTTTTTTCK does not appear in the output data structure because, while it ends in a K that is not followed by a P and so might in some cases be considered to be followed by an empty (or null) string, split will not produce trailing null fields when called as it is in the code. (Update: Therefore,  DAAAAATTLTTTAMTTTTTTCK is considered not to have been split at all, and so does not appear in the output structure.) See split for the rules about producing null trailing (and leading) fields. Note also that the protein  XXXXXXX does not appear in the output structure because it contains no split point whatsoever.

See Perl Data Structures Cookbook (perldsc) for more info on generating and accessing complex Perl data structures.


Give a man a fish:  <%-{-{-{-<


In reply to Re^5: Bioinformatics: Regex loop, no output by AnomalousMonk
in thread Bioinformatics: Regex loop, no output by TamaDP

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.