comment on

Hello,

I am trying to write a script that will accept an alignment of two protein sequences as an input, and given a residue in one sequence return the analogous (aligned) residue in the other sequence.

For example, consider the following alignment output:

SeqA: 36-190	SeqB: 206-334	Length: 155	%identity: 55.137

SeqA: LTIEAVPSNAAEGKEVLLLVHNLPQDPRGYNWYKGETVDANRRIIGYVISNQQITPGPAYSNRETIYPNASLLMRNVTRNDTGSYTLQVIKLNLMSEEVTGQ-FSVHPETPKPSISSNNSNPVEDKDAVAFTCEPETQNTTYLWWVNGQSLPVSP
SeqB: PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQE---------------------------LFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSP
                                                                                  ^

Here Sequence A consists of residues numbered 36-190 (represented by single-letter amino acid code, i.e. the sequence begins with leucine (L) 36 and ends with proline (P) 190), with one gap (represented by "-"), and Sequence B consists of residues numbered 206-334 with a larger gap in the middle.

I would like to enter a residue in Sequence B and have the code return the corresponding residue in Sequence A. For instance, if the query was I255 (the residue indicated by the carat), the returned value should be V112 (the corresponding residue in Sequence A).

My code now splits each sequence string into an array, and iterates through the array for SeqB, assigning corresponding residues (e.g. V) and positions (e.g. 112) in Sequence A as values to a hash with the keys being the position in Sequence B, skipping over gaps ("-"), and taking the starting residue number into account.

I was surprised that I couldn't find any other examples of this problem, and was wondering if there was a better way of doing this. Please let me know if any of the above isn't clear.

PearlMonks has proven to be an invaluable resource, and I am very grateful. I look forward to receiving your wisdom as I work through this problem.

In reply to Per residue sequence alignment - per character string comparison? by proteins

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.