So my general problem is one in bioinformatics: given the entire proteome of a species, where each gene can be spliced into multiple protein isoforms of different lengths, pick only the longest protein isoform for each gene

In other words, if there are genes A, B, C etc.

with respectively 3, 1 and 2 protein isoform(s)

a.1, a.2, a.3, b.1, c.2, c.5, c.7, and

of respective lengths 12, 11, 12, 15, 34, 12, 45, and

and with their corresponding peptide sequences,

then, I want the PERL script to return ONLY a.1 or a.3 for gene A and its corresponding sequence, b.1 and its sequence for gene B, c.7 and its sequence for gene C... you get the idea?

Your suggested syntax using grep worked, except that it returned all the matches into an array, so I used shift to gather the first match (which for my purposes is the same as any other matches, if multiple keys are present as matches to my values in the array)

Thanks for the useful syntax, I had not come across it yet in the Beginning Perl 3rd edition book, and its only been 10 days since I started teaching myself PERL! So your help is much appreciated...

I can this my "non-redundification" PRL script that removes protein isoform redundancy by selecting ONLY the longest isoform for each gene in a proteome. My script is ~100 lines long, which I think would be a joke for you Monks! But hey, I am just a Padawan learner as of now! :)

Thanks again to both of you!


In reply to Re^2: Extract hash keys for values stoted in array by onlyIDleft
in thread Extract hash keys for values stoted in array by onlyIDleft

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.