in reply to manipulating alignment data
>perl -wMstrict -le "use List::Util qw(reduce); use vars qw($a $b); ;; my @seqs = qw( ATCG--ATCG-ATCG ATGC--ATCG-ATCG ATGC-A-TCG-ATCG ATGC--ATCG-ATCG ATCG--ATCG-AACG ); ;; my $cons = reduce { $a ^ $b } @seqs; defined($cons) or die 'no sequences given'; ;; my $mask = @seqs % 2 ? $seqs[0] : (qq{\0} x length $seqs[0]); $cons ^= $mask; $cons =~ tr{\0-\xff}{.X}; print qq{'$cons'}; " '..XX.XX.....X..'
The code assumes all sequences have the same length. (I'm not sure just what happens if they don't.) The use vars qw($a $b); statement suppresses some pesky "Name ... used only once: possible typo at..." messages.
BTW: Both moritz and I get 2, 3, 5, 6 and 12 (not 8) as non-conserved positions. Is this correct?
Important Update: Bug! Sorry, this solution is sensitive to data sequence. To see the problem exemplified, try the sequence set
my @seqs = qw( CTCG--ATCG-ATCG CTGC--ATCG-ATCG ATGC-A-TCG-ATCG ATGC--ATCG-ATCG );
|
|---|