All,
Tanktalus presented an interesting puzzle at Challenge: Letter Power. Given a number of categories, choose a unique word that fits each category to yield the highest score possible. Each letter used is worth 1 point the first time it is encountered, 2 points the second etc in a progressive fashion. In other words, if there were only two categories and you chose "moose" and "mother", the total score would be 16.

In the example presented by Tanktalus, the candidate word list to choose from had already been tailored to favor a high density of the letter a. I wondered what would happen if you were to generalize the problem. We both agreed that it would be too difficult to expect a computer to find words appropriate to fit a random category so instead, I have chosen to choose words at random.

I have already explored a number of heuristic approaches for making a "best guess", refinement algorithms to improve a "best guess" and a naive brute force implementation in C to perform an exhaustive search to produce the guaranteed optimal solution. I will present all of that in replies to this main thread for those looking for inspiration or validation.

Challenge:

Develop a pure perl solution that, in roughly 60 seconds, will provide a high scoring solution to a given puzzle. I will be providing a number of canned puzzles with the known best solution as well as code you can use to generate your own. There is no penalty for prep work behind the scenes so if you have some ingenious idea to construct a database of solutions or what not - by all means. Just keep in mind that the presented solution must be pure perl and finish within roughly 60 seconds.

Note: I will be adding my own contributions later today but most of my ideas have already been discussed over in the original challenge.

Cheers - L~R

Replies are listed 'Best First'.
Challenge: Letter Power Revisited (Brute Force)
by Limbic~Region (Chancellor) on Apr 20, 2009 at 14:24 UTC
    All,
    Here is my brute force C implementation which on my medicore machine can do over 10 million guesses per second if compiled with gcc -O3 -funroll-loops brute_force.c I never bothered to learn C properly so please excuse my infantile attempt:

    Cheers - L~R

      You might want to give sizes to the arrays cat1 - cat10. At present, you are writing all over memory that you don't intend to.

      G. Wade
        gwadej,
        Thanks. I actually have some very bad perl that generates very naive C. Since the perl can know at runtime the number of elements, it does provide sizes. I didn't want the post to be about my lack of knowledge of C so I shared it as a blank template so that those who wanted to could modify it appropriately.

        Cheers - L~R

Challenge: Letter Power Revisited (Sample Puzzles)
by Limbic~Region (Chancellor) on Apr 20, 2009 at 14:25 UTC
    All,
    Here are some sample puzzles with the known best solution provided so you can judge how well you are doing. I may add more as I find time.

    Cheers - L~R

Challenge: Letter Power Revisited (Generate Your Own Puzzle)
by Limbic~Region (Chancellor) on Apr 20, 2009 at 14:26 UTC
    All,
    Assuming you have a file called 'words.txt' that contains 1 word per line, this code can be used to generate a random puzzle.
    #!/usr/bin/perl use strict; use warnings; use List::Util 'shuffle'; my $num_cat = 20; my $max_len = 10; my $min_sol = 6; my $max_sol = 30; my @word; open(my $fh, '<', 'words.txt') or die "Unable to open 'words.txt' for +reading: $!"; while (<$fh>) { tr/a-z//cd; next if length($_) > $max_len; push @word, $_; } @word = shuffle(@word); for my $cat (1 .. $num_cat) { my $count = int(rand ($max_sol - ($min_sol - 1))) + $min_sol; my @cat = splice(@word, 0, $count); print "@cat\n"; }

    Cheers - L~R

Challenge: Letter Power Revisited (Heuristics)
by Limbic~Region (Chancellor) on Apr 20, 2009 at 22:27 UTC
    All,
    An exhaustive search of all possible solutions for these problems would be unrealistic in perl due to the run time. If you aren't willing to write C or can't figure out some trick to divine the optimal solution without brute force, then you will have to fall back to a heuristic solution. I will present 3 different methods for choosing the initial best guess and a couple different refining methods for improving that guess.

    Please note that these are snippets and not full solutions.

    Cheers - L~R

Re: Challenge: Letter Power Revisited
by delirium (Chaplain) on Apr 21, 2009 at 20:02 UTC
    I'm a little rusty, but this looked fun and I took a stab at it. Here's a convaluted, winding, unoptimized memory hog that generates a high scoring solution, but not always the highest possible score given arbitrary lines of input...
    #!/usr/bin/perl -w use strict; my @lines=<DATA>; my %hash = (); my %highscores = (); my @final_words = (); my $final_score = 0; my $longword = ''; for my $letter ( "a" .. "z" ) { my $line = ''; for (@lines) { $line = $_; my @words = split; for my $word (@words) { my $tmp = scalar grep {/$letter/} split (//,$word); if (!$hash{$line}{$letter}{highscore} || $hash{$line}{$letter}{high +score} < $tmp) { $hash{$letter}{$line}{highscore} = $tmp; $hash{$letter}{$line}{word} = $word; $hash{$line}{$letter}{highscore} = $tmp; $hash{$line}{$letter}{word} = $word; } } } $highscores{$letter} = 0; $highscores{$letter} += $hash{$letter}{$_}{highscore} for keys %{$has +h{$letter}}; if (!$highscores{bigletter}{score} || $highscores{bigletter}{score} < + $highscores{$letter}) { $highscores{bigletter}{score} = $highscores{$letter}; $highscores{bigletter}{letter} = $letter; } } my $bl = $highscores{bigletter}{letter}; print "Letter with highest score is \"$bl\" with $highscores{bigletter +}{score} occurrences\n"; for my $line (keys %{$hash{$bl}}) { push @final_words,$hash{$line}{$bl}{word}; } print "\nWords chosen:\n"; print "$_\n" for @final_words; $longword .= $_ for @final_words; my @final_letters = split (//,$longword); for my $letter ( "a" .. "z" ) { my $count = grep {/$letter/} @final_letters; $final_score += ($count * $count + $count) / 2; } print "\nFinal score: $final_score\n"; __DATA__ alabama arkansas alaska delaware hawaii indiana kansas montana delawar +e sazerac sangrita radiator gastank engine heater fender wheel detent battery clutch mirr +or window alloy amalgam vanadium copper steel rutabaga limabean cress carrot sorrel squash cabbage pepper lettuce be +et leek celery endive rhubarb parsnip pumpkin azalia camellia dahlia gardenia gentian vervain canna hepatica bluebel +l anemone oleander lasagna macaroni pastina gnocchi tortelli alfabeto mascagni britten menotti unamas tacotime pizzahut tacobell panago tacomayo edojapan hardees caramel marzipan taiglach taffy brittle fondant toffee dragee __OUTPUT__ Letter with highest score is "a" with 26 occurrences Words chosen: alabama unamas sazerac radiator lasagna amalgam caramel azalia mascagni rutabaga Final score: 457
    Note the score is less than the shown maximum of 474 for this data set.