Challenge: Letter Power Revisited (Heuristics)

All,
An exhaustive search of all possible solutions for these problems would be unrealistic in perl due to the run time. If you aren't willing to write C or can't figure out some trick to divine the optimal solution without brute force, then you will have to fall back to a heuristic solution. I will present 3 different methods for choosing the initial best guess and a couple different refining methods for improving that guess.

Please note that these are snippets and not full solutions.

Initial Guess

Choose The Word From Each Category That Has The Highest Individual Score

This is actually BrowserUk's approach from the original challenge.

#! perl -slw
use strict;
use Time::HiRes qw[ time ];

my $start = time;

my @lookup = map{ ($_+1) * $_ /2 } 0 .. 80;
sub calcScore {
    my( %chars, $score );
    $chars{ $_ }++ for map{ split'' } @{ $_[0] };
    $score += $lookup[ $_ ] for values %chars;
    return $score;
}

my @cats = map{ [ split ' ' ] } <DATA>;
close DATA;

## Sort the words in each category by their individual scores (descend
+ing)
my %wordScores;
@{ $cats[ $_ ] } = sort{
    ( $wordScores{ $b } ||= calcScore [$b] )
    <=>
    ( $wordScores{ $a } ||= calcScore [$a] )
} @{ $cats[ $_ ] } for 0 .. $#cats;

## form our guess from the highest scoring words in each category
my @guess = map{ $cats[ $_ ][ 0 ] } 0 .. $#cats;
my $best = calcScore \@guess;
[download]

This is extremely fast but has the disadvantage of not considering the letter frequency in the other categories.

Choose The Word From Each Category That Has The Highest Density Of The Most Frequent Letters Across All Candidates

This was my first attempt and runs extremely fast. Unfortunately, it doesn't consider that only 1 choice may be made from each category so it is possible for 1 category to throw off the frequency distribution.

my (@candidate, %freq);
while (<DATA>) {
    chomp;
    my @list = split ' ';
    push @candidate, \@list;
    for my $word (@list) {
        ++$freq{$_} for split //, $word;
    }
}
my $tot = sum(values %freq);
$freq{$_} = ($freq{$_} / $tot) * 100 for keys %freq;

my (@solution, %let);
for my $list (@candidate) {
    my ($best, $item) = (0, '');
    for my $word (@$list) {
        my $score = sum(map $freq{$_}, split //, $word);
        ($best, $item) = ($score, $word) if $score > $best;
    }
    ++$let{$_} for split //, $item;
    push @solution, $item;
}
my $total = sum(map {my $n = $let{$_}; ($n * $n + $n) / 2} keys %let);
print "$total : [ @solution ]\n";
[download]

Choose The Word From Each Category That Has The Highest Density Of The Most Frequent Letters By Category

This is actually a refinement of the previous algorithm. Originally, if the letter 'e' made up 67% of all letters it would be heavily favored. This approach considers that the majority of those may be coming from 1 category where we can only choose 1 word. Now the letter 'e' would have to score high in the majority of categories for it to be so heavily favored.

my (@candidate, %freq);
while (<DATA>) {
    chomp;
    my @list = split ' ';
    push @candidate, \@list;
    for my $word (@list) {
        ++$freq{$#candidate}{$_} for split //, $word;
    }
}

for my $idx (keys %freq) {
    my $tot = sum(values %{$freq{$idx}});
    for my $let (keys %{$freq{$idx}}) {
        $freq{$idx}{$let} = ($freq{$idx}{$let} / $tot) * 100;
    }
}

for my $let ('a' .. 'z') {
    my $val = 0;
    for my $idx (0 .. $#candidate) {
        $val += $freq{$idx}{$let} if $freq{$idx}{$let};
    }
    $freq{$let} = $val;
}

my (@solution, %let);
for my $list (@candidate) {
    my ($best, $item) = (0, '');
    for my $word (@$list) {
        my $score = sum(map $freq{$_}, split //, $word);
        ($best, $item) = ($score, $word) if $score > $best;
    }
    ++$let{$_} for split //, $item;
    push @solution, $item;
}
my $best = sum(map {my $n = $let{$_}; ($n * $n + $n) / 2} keys %let);
print "$best : [ @solution ]\n";
[download]

As you can tell this code is pretty messy. I was more interested in proof of concept to compare answers rather than how clean the code was.

Improve Guess Through Refinement

Change One Category At A Time

This too is BrowserUk's approach from the original challenge. The basic idea is to see if changing the guess improves the score iteratively through each category. If yes, start over at the beginning.

## Loop over the categories swaping the other words in that cateory 
## for the guessed word. If we get a better score, start again.
LOOP:
for my $iCat ( 0 .. $#cats ) {
    my $cat = $cats[ $iCat ];
    for my $iWord ( 1 .. $#$cat ) {
        my $test = calcScore [ 
            @guess[ 0 .. $iCat -1 ], 
            $cat->[ $iWord ], 
            @guess[ $iCat+1 .. $#guess ] 
        ];
        if( $test > $best ) {
            $best = $test;
            $guess[ $iCat ] = $cat->[ $iWord ];
            print "$best : [ @guess ]";
            redo LOOP;
        }
    }
}
[download]

This did the job for the original puzzle presented by Tanktalus. I felt there were a couple of things that could be changed for a general purpose solution. The first is that there is no escape clause if improvements are still being found. If an extreme edge case is used as input, then it is possible that the initial guess is far from the mark and it may as well be doing an exhaustive search anyway. The second thing is that the loop starts over if an improvement is found in the 3rd category without checking the rest. It is possible that leaving category 3 alone and changing category 4 would lead to an even better score.

Explore All 1-Category Change Improvements (DFS With Escape Clause)

I extended BrowserUk's idea. Instead of starting over when an improvement is found, the improvement is pushed on to a work queue and the next category is checked. This effectively performs a depth-first search. Since this is even more likely to turn into an exhaustive search, I implemented two different escape clauses. The first limits the total number of iterations and the second limits the total run time. I will present the latter since it won't arbitrarily help or hinder results on platforms with more or less horse power.

my %seen;
eval {
    local $SIG{ALRM} = sub { die "Timed Out\n"; };
    alarm 60;
    my @work = [$best, \@solution];
    while (@work) {
        my $item = pop @work;
        my ($cur_best, $cur_guess) = @$item;
        for my $idx (0 .. $#candidate) {
            for my $word (@{$candidate[$idx]}) {
                my @new_guess = @$cur_guess;
                $new_guess[$idx] = $word;
                next if $seen{"@new_guess"}++;
                my $new_tot = calc_score(@new_guess);
                if ($new_tot > $cur_best) {
                    push @work, [$new_tot, \@new_guess];
                    if ($new_tot > $best) {
                        $best = $new_tot;
                        print "$best : [ @new_guess ]\n";
                    }
                }
            }
        }
    }
    alarm 0;
};
[download]

Further Exploration

Well, I hope someone finds this challenge interesting and presents a solution that dazzles and amazes. If you are looking for even more inspiration, there are several nodes in the original challenge worth looking at.

Cheers - L~R

Comment on Challenge: Letter Power Revisited (Heuristics) Select or Download Code