davi54 has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I have a very basic question. So, I have a list of protein sequences, where each entry has a header followed by the actual sequence (all uppercase alphabet sequence) and separated from the next entry with a blank new line (as shown in the example below). Is there a way where the script reads the uppercase sequence and outputs the length of the sequence and the number of times the alphabet A occurs in that sequence? So, for the two example sequences below:
>sp|O24310|EFTU_PEA Elongation factor Tu, chloroplastic OS=Pisum sativum OX=3888 GN=TUFA PE=2 SV=1
MALSSTAATTSSKLKLSNPPSLSHTFTASASASVSNSTSFR
>sp|Q43467|EFTU1_SOYBN Elongation factor Tu, chloroplastic OS=Glycine max OX=3847 GN=TUFA PE=3 SV=1
MAVSSATASSKLILLPHASSSSSLNSTPFRSSTTNTHKLTPADSTHNIKL
I want the output to look like:
Sequence: MALSSTAATTSSKLKLSNPPSLSHTFTASASASVSNSTSFR
Length: 41
A: 6
Sequence: MAVSSATASSKLILLPHASSSSSLNSTPFRSSTTNTHKLTPADSTHNIKL
Length: 50
A: 5
Any help would be appreciated. Thank you so much.
|
|---|