gudluck has asked for the wisdom of the Perl Monks concerning the following question:
I have a bunch of 10 or more very closely related chromosome (DNA) sequences (from different strains) aligned to a "single" reference chromosome. How can I generate multiple sequence alignment along with "reference" of all these sequences without affecting individual alignments to the reference.
What I need is to do is to "add dashes" at relative "insert" positions in other sequences, so that I get a global alignment with respect to reference. "Note. Here I am NOT looking for sequence similarities".
For example: If from position 100 to 125 in sequence No. 1 has an insert. but not in other 9 sequences. In the final expanded alignment I will insert 26 dashes (-) (from 100-125) in the sequences 2 to 10 and in the reference.
And in another situation like above if seq1 has insert at 100-125 and if seq5 has insert from 110-115 and seq8 has insert from 112-119. then I need to insert dashes like above, except in these positions (in seq5: 110-115 and in seq8: 112-119)
Please give me suggestions of logic to script to address this problem.Input: Original reference: CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC Pairwise alignments: Ref1: CGACAAT--GCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC Seq1: CGACAATAAGCACGACAGAGGAAGCAGAACAGATA-----ATTGCCTCTCATTTTC-CTCCC Ref1: CGACAATGCACGACAGAGGAAGC--AGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC Seq2: CGACAAT-CACGACAGAGGAAGCTTAGAACAGATATTTAG---GCCTCTCATTTTCTCTCCC Ref1: CGACAATGCACGACAGAGGAAG----CAGAACAGATATTTAGATTGCCTCTCA----TTTTCTC +TCCC Seq3: CGACAATGCACGACAGAGGAAGTTTTCAGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTC +TCCC Output: Final Multiple sequence alignment: Ref1: CGACAAT--GCACGACAGAGGAAG----C--AGAACAGATATTTAGATTGCCTCTCA----TTT +TCTCTCCC Seq1: CGACAATAAGCACGACAGAGGAAG----C--AGAACAGATA-----ATTGCCTCTCA----TTT +TC-CTCCC Seq2: CGACAAT---CACGACAGAGGAAG----CTTAGAACAGATATTTAG---GCCTCTCA----TTT +TCTCTCCC Seq3: CGACAAT--GCACGACAGAGGAAGTTTTC--AGAACAGATATTTAGATTGCCTCTCAAAAATTT +TCTCTCCC
|
|---|