in reply to Finding Consensus String

Since you're dealing with frequency, I'd use a hash to store the occurrences of A, C, G, and T in each position of the word. I'd loop over the DNA sequences for each character, looking at character 1 first in all strands, then character 2, etc.
sub consensus { my $len = length($_[0]); # all strands are of one length, right? my @cons; # loop over the positions in the strands for my $i (0 .. $len - 1) { my %freq; my $max = 0; # loop over character $i in each of the strands for (map substr($_, $i, 1), @_) { $freq{$_}++; # save the max value $max = $freq{$_} if $freq{$_} > $max; } # store only those who occur the most times push @cons, [grep $freq{$_} == $max, keys %freq]; } return \@cons; }
Then it's your task to turn the array of array references into the string you want.

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re^2: Finding Consensus String
by monkfan (Curate) on Oct 19, 2005 at 16:31 UTC
    Dear japhy,
    Thanks so much for your answer to my posting. I just have a question. Your code works well in generalize version of array like this:
    my @instances = ( 'AAAAA ATCGA', 'ATCGA AAAAA', 'ATAAA AAAAA', );
    But how can I make it more generalize for cases like this?
    my @instances2 = ( 'AAAAA ATCGA', 'ATCGA', 'ATAAA AAAAA', ); # consensus : ATAAA A[AT][CA][GA]A
    Or this:
    my @instances3 = ( 'AAAAA ATCGA TTTT', 'ATCGA TATA ', 'ATAAA AAAAA ATTA ', ); # consensus : ATAAA A[AT][CA][GA]A TTTA
    Cause, currently it gives warning when encountering two cases like that.

    Regards,
    Edward
      I don't get any warnings. What I do notice is that space is included in the [...] parts. So here's a small fix:
      sub consensus { ... # code from before, until: # store only those who occur the most times my @most = grep $freq{$_} == $max, keys %freq; @most = grep $_ ne " ", @most if @most > 1; push @cons, \@most; } return \@cons; }
      And to get the string version, I use:
      sub consensus_to_string { my $con = shift; my $str; for (@$con) { if (@$_ > 1) { $str .= "[" . join("", @$_) . "]" } else { $str .= $_->[0] } } return $str; }

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart