Hi, I have a very basic question. So, I have a list of protein sequences, where each entry has a header followed by the actual sequence (all uppercase alphabet sequence) and separated from the next entry with a blank new line (as shown in the example below). Is there a way where the script reads the uppercase sequence and outputs the length of the sequence and the number of times the alphabet A occurs in that sequence? So, for the two example sequences below:
>sp|O24310|EFTU_PEA Elongation factor Tu, chloroplastic OS=Pisum sativum OX=3888 GN=TUFA PE=2 SV=1
MALSSTAATTSSKLKLSNPPSLSHTFTASASASVSNSTSFR
>sp|Q43467|EFTU1_SOYBN Elongation factor Tu, chloroplastic OS=Glycine max OX=3847 GN=TUFA PE=3 SV=1
MAVSSATASSKLILLPHASSSSSLNSTPFRSSTTNTHKLTPADSTHNIKL
I want the output to look like:
Sequence: MALSSTAATTSSKLKLSNPPSLSHTFTASASASVSNSTSFR
Length: 41
A: 6
Sequence: MAVSSATASSKLILLPHASSSSSLNSTPFRSSTTNTHKLTPADSTHNIKL
Length: 50
A: 5
Any help would be appreciated. Thank you so much.
In reply to How to count the length of a sequence of alphabets and number of occurence of a particular alphabet in the sequence? by davi54
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |