comment on

All,
An exhaustive search of all possible solutions for these problems would be unrealistic in perl due to the run time. If you aren't willing to write C or can't figure out some trick to divine the optimal solution without brute force, then you will have to fall back to a heuristic solution. I will present 3 different methods for choosing the initial best guess and a couple different refining methods for improving that guess.

Please note that these are snippets and not full solutions.

Initial Guess

Choose The Word From Each Category That Has The Highest Individual Score

This is actually BrowserUk's approach from the original challenge.

#! perl -slw
use strict;
use Time::HiRes qw[ time ];

my $start = time;

my @lookup = map{ ($_+1) * $_ /2 } 0 .. 80;
sub calcScore {
    my( %chars, $score );
    $chars{ $_ }++ for map{ split'' } @{ $_[0] };
    $score += $lookup[ $_ ] for values %chars;
    return $score;
}

my @cats = map{ [ split ' ' ] } <DATA>;
close DATA;

## Sort the words in each category by their individual scores (descend
+ing)
my %wordScores;
@{ $cats[ $_ ] } = sort{
    ( $wordScores{ $b } ||= calcScore [$b] )
    <=>
    ( $wordScores{ $a } ||= calcScore [$a] )
} @{ $cats[ $_ ] } for 0 .. $#cats;

## form our guess from the highest scoring words in each category
my @guess = map{ $cats[ $_ ][ 0 ] } 0 .. $#cats;
my $best = calcScore \@guess;
[download]

This is extremely fast but has the disadvantage of not considering the letter frequency in the other categories.

Choose The Word From Each Category That Has The Highest Density Of The Most Frequent Letters Across All Candidates

This was my first attempt and runs extremely fast. Unfortunately, it doesn't consider that only 1 choice may be made from each category so it is possible for 1 category to throw off the frequency distribution.

my (@candidate, %freq);
while (<DATA>) {
    chomp;
    my @list = split ' ';
    push @candidate, \@list;
    for my $word (@list) {
        ++$freq{$_} for split //, $word;
    }
}
my $tot = sum(values %freq);
$freq{$_} = ($freq{$_} / $tot) * 100 for keys %freq;

my (@solution, %let);
for my $list (@candidate) {
    my ($best, $item) = (0, '');
    for my $word (@$list) {
        my $score = sum(map $freq{$_}, split //, $word);
        ($best, $item) = ($score, $word) if $score > $best;
    }
    ++$let{$_} for split //, $item;
    push @solution, $item;
}
my $total = sum(map {my $n = $let{$_}; ($n * $n + $n) / 2} keys %let);
print "$total : [ @solution ]\n";
[download]

Choose The Word From Each Category That Has The Highest Density Of The Most Frequent Letters By Category

This is actually a refinement of the previous algorithm. Originally, if the letter 'e' made up 67% of all letters it would be heavily favored. This approach considers that the majority of those may be coming from 1 category where we can only choose 1 word. Now the letter 'e' would have to score high in the majority of categories for it to be so heavily favored.

my (@candidate, %freq);
while (<DATA>) {
    chomp;
    my @list = split ' ';
    push @candidate, \@list;
    for my $word (@list) {
        ++$freq{$#candidate}{$_} for split //, $word;
    }
}

for my $idx (keys %freq) {
    my $tot = sum(values %{$freq{$idx}});
    for my $let (keys %{$freq{$idx}}) {
        $freq{$idx}{$let} = ($freq{$idx}{$let} / $tot) * 100;
    }
}

for my $let ('a' .. 'z') {
    my $val = 0;
    for my $idx (0 .. $#candidate) {
        $val += $freq{$idx}{$let} if $freq{$idx}{$let};
    }
    $freq{$let} = $val;
}

my (@solution, %let);
for my $list (@candidate) {
    my ($best, $item) = (0, '');
    for my $word (@$list) {
        my $score = sum(map $freq{$_}, split //, $word);
        ($best, $item) = ($score, $word) if $score > $best;
    }
    ++$let{$_} for split //, $item;
    push @solution, $item;
}
my $best = sum(map {my $n = $let{$_}; ($n * $n + $n) / 2} keys %let);
print "$best : [ @solution ]\n";
[download]

As you can tell this code is pretty messy. I was more interested in proof of concept to compare answers rather than how clean the code was.

Improve Guess Through Refinement

Change One Category At A Time

This too is BrowserUk's approach from the original challenge. The basic idea is to see if changing the guess improves the score iteratively through each category. If yes, start over at the beginning.

## Loop over the categories swaping the other words in that cateory 
## for the guessed word. If we get a better score, start again.
LOOP:
for my $iCat ( 0 .. $#cats ) {
    my $cat = $cats[ $iCat ];
    for my $iWord ( 1 .. $#$cat ) {
        my $test = calcScore [ 
            @guess[ 0 .. $iCat -1 ], 
            $cat->[ $iWord ], 
            @guess[ $iCat+1 .. $#guess ] 
        ];
        if( $test > $best ) {
            $best = $test;
            $guess[ $iCat ] = $cat->[ $iWord ];
            print "$best : [ @guess ]";
            redo LOOP;
        }
    }
}
[download]

This did the job for the original puzzle presented by Tanktalus. I felt there were a couple of things that could be changed for a general purpose solution. The first is that there is no escape clause if improvements are still being found. If an extreme edge case is used as input, then it is possible that the initial guess is far from the mark and it may as well be doing an exhaustive search anyway. The second thing is that the loop starts over if an improvement is found in the 3rd category without checking the rest. It is possible that leaving category 3 alone and changing category 4 would lead to an even better score.

Explore All 1-Category Change Improvements (DFS With Escape Clause)

I extended BrowserUk's idea. Instead of starting over when an improvement is found, the improvement is pushed on to a work queue and the next category is checked. This effectively performs a depth-first search. Since this is even more likely to turn into an exhaustive search, I implemented two different escape clauses. The first limits the total number of iterations and the second limits the total run time. I will present the latter since it won't arbitrarily help or hinder results on platforms with more or less horse power.

my %seen;
eval {
    local $SIG{ALRM} = sub { die "Timed Out\n"; };
    alarm 60;
    my @work = [$best, \@solution];
    while (@work) {
        my $item = pop @work;
        my ($cur_best, $cur_guess) = @$item;
        for my $idx (0 .. $#candidate) {
            for my $word (@{$candidate[$idx]}) {
                my @new_guess = @$cur_guess;
                $new_guess[$idx] = $word;
                next if $seen{"@new_guess"}++;
                my $new_tot = calc_score(@new_guess);
                if ($new_tot > $cur_best) {
                    push @work, [$new_tot, \@new_guess];
                    if ($new_tot > $best) {
                        $best = $new_tot;
                        print "$best : [ @new_guess ]\n";
                    }
                }
            }
        }
    }
    alarm 0;
};
[download]

Further Exploration

Well, I hope someone finds this challenge interesting and presents a solution that dazzles and amazes. If you are looking for even more inspiration, there are several nodes in the original challenge worth looking at.

Cheers - L~R

In reply to Challenge: Letter Power Revisited (Heuristics) by Limbic~Region
in thread Challenge: Letter Power Revisited by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.