Hello monks!
I have a file with entries as follows:
>ttt
SEQ:MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFINNNGPTHENQLGAGAFGGYQ
+
LBL:IIIIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMOOOOOOOOOOOOOOOMMMMMMMMMMMI
+
LBL:IIIIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMOOOOOOOOOOOOOOOOOOOOOOMMMMMMMM
+
LBL:IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMMMOOOOOOOOOOOMMMMMMMMMMMI
+
LBL:IIIIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMOOOOOOOOOOOOOOOOOOOOMMMMMMMMI
What I must do is store all lines beginning with LBL (easy part) and then, create a consensus label line foreach letter in SEQ (M,K,K,T,A,I etc).The consensus line will have I,M,O, like all LBL lines, and, in order to descide which letter to use, we will assign a label letter to a SEQ letter if the majority of LBL lines agree.
For example, if for the letter B of SEQ, the first LBL line says I, the second I, the third M and the fourth I, we will use I in the consensus label line.
I thought using hashes, but I have a problem how to store each labelm, please advice...