in reply to Can't think of an algorithm to go..

sub consensus { # Picks an arbitrary result from # among the highest in case of tie. my %counts; $counts{$_}++ for @_; return ( sort { $counts{$b} <=> $counts{$a} } keys(%counts) )[0]; } { chomp( my $seq = <> ); chomp( my @lbls = <> ); s/^SEQ:// for $seq; s/^LBL:// for @lbls; for my $i (0..length($seq)-1) { printf("%s %s\n", substr($seq, $i, 1), consensus( map substr($_, $i, 1), @lbls ), ); } }

Alternate output:

my $consensus = ''; for my $i (0..length($seq)-1) { $consensus .= consensus( map substr($_, $i, 1), @lbls ); } print("SEQ: $seq\n"); print("LBL: $consensus\n");

Contrary to rovf and AZed, it's my understanding that the consensus for the "K" at index 1 isn't affected by the labels for the "K" at index 2. Let me know if that's wrong.

Replies are listed 'Best First'.
Re^2: Can't think of an algorithm to go..
by Anonymous Monk on Jun 03, 2009 at 20:59 UTC
    Thanks a lot to both of you!
    Ikegami, although I am still trying to figure out what you code means, it works perfect!
    Thanks for your trouble!

      For each index (for my $i (0..length($seq)-1)), it gets the letter at that index in each label (map substr($_, $i, 1), @lbls).

      To calculate the consensus, the number of instances of each letter is counted ($counts{$_}++ for @_;). Then the letters (keys(%counts)) are sorted by descending number of instances (sort { $counts{$b} <=> $counts{$a} }), and one of those with the highest count is returned (( )[0]).