I am trying to write a script that will accept an alignment of two protein sequences as an input, and given a residue in one sequence return the analogous (aligned) residue in the other sequence.
For example, consider the following alignment output:
SeqA: 36-190 SeqB: 206-334 Length: 155 %identity: 55.137
SeqA: LTIEAVPSNAAEGKEVLLLVHNLPQDPRGYNWYKGETVDANRRIIGYVISNQQITPGPAYSNRETIYPNASLLMRNVTRNDTGSYTLQVIKLNLMSEEVTGQ-FSVHPETPKPSISSNNSNPVEDKDAVAFTCEPETQNTTYLWWVNGQSLPVSP
SeqB: PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQE---------------------------LFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSP
^
Here Sequence A consists of residues numbered 36-190 (represented by single-letter amino acid code, i.e. the sequence begins with leucine (L) 36 and ends with proline (P) 190), with one gap (represented by "-"), and Sequence B consists of residues numbered 206-334 with a larger gap in the middle.
I would like to enter a residue in Sequence B and have the code return the corresponding residue in Sequence A. For instance, if the query was I255 (the residue indicated by the carat), the returned value should be V112 (the corresponding residue in Sequence A).
My code now splits each sequence string into an array, and iterates through the array for SeqB, assigning corresponding residues (e.g. V) and positions (e.g. 112) in Sequence A as values to a hash with the keys being the position in Sequence B, skipping over gaps ("-"), and taking the starting residue number into account.
I was surprised that I couldn't find any other examples of this problem, and was wondering if there was a better way of doing this. Please let me know if any of the above isn't clear.
PearlMonks has proven to be an invaluable resource, and I am very grateful. I look forward to receiving your wisdom as I work through this problem.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |