Re: Can't think of an algorithm to go..

sub consensus {
   # Picks an arbitrary result from
   # among the highest in case of tie.

   my %counts;
   $counts{$_}++ for @_;
   return ( sort { $counts{$b} <=> $counts{$a} } keys(%counts) )[0];
}

{
   chomp( my $seq  = <> );
   chomp( my @lbls = <> );

   s/^SEQ:// for $seq;
   s/^LBL:// for @lbls;

   for my $i (0..length($seq)-1) {
      printf("%s %s\n",
         substr($seq, $i, 1),
         consensus( map substr($_, $i, 1), @lbls ),
      );
   }
}
[download]

Alternate output:

   my $consensus = '';

   for my $i (0..length($seq)-1) {
      $consensus .= consensus( map substr($_, $i, 1), @lbls );
   }

   print("SEQ: $seq\n");
   print("LBL: $consensus\n");
[download]

Contrary to rovf and AZed, it's my understanding that the consensus for the "K" at index 1 isn't affected by the labels for the "K" at index 2. Let me know if that's wrong.

Comment on Re: Can't think of an algorithm to go.. Select or Download Code

Replies are listed 'Best First'.
Re^2: Can't think of an algorithm to go.. by Anonymous Monk on Jun 03, 2009 at 20:59 UTC
Thanks a lot to both of you! Ikegami, although I am still trying to figure out what you code means, it works perfect! Thanks for your trouble!	[reply]
Re^3: Can't think of an algorithm to go.. by ikegami (Patriarch) on Jun 03, 2009 at 21:23 UTC
For each index (`for my $i (0..length($seq)-1)`), it gets the letter at that index in each label (`map substr($_, $i, 1), @lbls`). To calculate the consensus, the number of instances of each letter is counted (`$counts{$_}++ for @_;`). Then the letters (`keys(%counts)`) are sorted by descending number of instances (`sort { $counts{$b} <=> $counts{$a} }`), and one of those with the highest count is returned (`( )[0]`).	[reply] [d/l] [select]