Protein multiple sequence alignment

onlyIDleft has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, I want to align several proteins, but here is the problem - I want to use existing knowledge of protein domains in each protein that needs to be multiply aligned

It seems like existing sophisticated alignment software like POA, A-Bruijn and ProDA still are blind to domain annotations. Perhaps I should mention that each domain can be considered an independent evolutionary unit, just as an amino acid in proteins, or nucleotide in DNA would be.

Is anyone here aware of any tools in PERL or BioPERL that can take in a multifasta file, the Pfam domain data for these same sequence, and return an alignment that is a combination of global alignment along with local domain-constrained alignment?

The final complication is that several of these proteins can have rearranged order of linear domains, so is there a way around it like I think A-Bruijn attempts to do? But this problem is less important than just the 1st task of aligning proteins constraining using domain annotation coordinates...

I realize that this might not be the right forum to ask heavy computational biology theory questions, but hey I am a newbie, so I am taking a stab at seeking your help anyways...

Comment on Protein multiple sequence alignment

Replies are listed 'Best First'.
Re: Protein multiple sequence alignment by educated_foo (Vicar) on May 05, 2011 at 20:47 UTC
This is very much not a Perl question. You're trying to do a global multiple alignment with a penalty for misaligned domains? I'm not sure why this additional penalty would help, but I'm not the one working on the problem. One hack you could try is just to pad the domain boundaries with something, so misaligning domains would cause mismatches. Print it out, run your tool, read it back in. If that doesn't work, and if there isn't some existing tool that does what you want, you'll probably just have to write your own. I don't think this is a standard problem, so you'll probably end up here.	[reply]

Replies are listed 'Best First'.

Re: Protein multiple sequence alignment
by educated_foo (Vicar) on May 05, 2011 at 20:47 UTC

If that doesn't work, and if there isn't some existing tool that does what you want, you'll probably just have to write your own. I don't think this is a standard problem, so you'll probably end up here.

[reply]