in reply to Challenge: Letter Power

My first attempt was brute force, unremarkable, found the 474 after processing 95% of the combinations and took just under 8 hours to run.

My second finds the 474 in 0.15 seconds.

P:\test>499358-2 460 : [ alabama sangrita radiator amalgam rutabaga camellia lasagna ma +scagni tacomayo marzipan ] 472 : [ alabama sangrita radiator amalgam rutabaga azalia lasagna masc +agni tacomayo marzipan ] 474 : [ alabama sangrita radiator amalgam rutabaga azalia lasagna masc +agni tacomayo taiglach ] Elapsed: 0.015 seconds.

See comments for the strategy used.

#! perl -slw use strict; use Time::HiRes qw[ time ]; my $start = time; my @lookup = map{ ($_+1) * $_ /2 } 0 .. 80; sub calcScore { my( %chars, $score ); $chars{ $_ }++ for map{ split'' } @{ $_[0] }; $score += $lookup[ $_ ] for values %chars; return $score; } my @cats = map{ [ split ' ' ] } <DATA>; close DATA; ## Sort the words in each category by their individual scores (descend +ing) my %wordScores; @{ $cats[ $_ ] } = sort{ ( $wordScores{ $b } ||= calcScore [$b] ) <=> ( $wordScores{ $a } ||= calcScore [$a] ) } @{ $cats[ $_ ] } for 0 .. $#cats; ## form our guess from the highest scoring words in each category my @guess = map{ $cats[ $_ ][ 0 ] } 0 .. $#cats; my $best = calcScore \@guess; ## Loop over the categories swaping the other words in that cateory ## for the guessed word. If we get a better score, start again. LOOP: for my $iCat ( 0 .. $#cats ) { my $cat = $cats[ $iCat ]; for my $iWord ( 1 .. $#$cat ) { my $test = calcScore [ @guess[ 0 .. $iCat -1 ], $cat->[ $iWord ], @guess[ $iCat+1 .. $#guess ] ]; if( $test > $best ) { $best = $test; $guess[ $iCat ] = $cat->[ $iWord ]; print "$best : [ @guess ]"; redo LOOP; } } } printf "Elapsed: %.3f seconds.\n", time - $start; __DATA__ alabama arkansas alaska delaware hawaii indiana kansas montana delawar +e sazerac sangrita radiator gastank engine heater fender wheel detent battery clutch mirr +or window alloy amalgam vanadium copper steel rutabaga limabean cress carrot sorrel squash cabbage pepper lettuce be +et leek celery endive rhubarb parsnip pumpkin azalia camellia dahlia gardenia gentian vervain canna hepatica bluebel +l anemone oleander lasagna macaroni pastina gnocchi tortelli alfabeto mascagni britten menotti unamas tacotime pizzahut tacobell panago tacomayo edojapan hardees caramel marzipan taiglach taffy brittle fondant toffee dragee

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^2: Challenge: Letter Power
by Limbic~Region (Chancellor) on Apr 17, 2009 at 00:15 UTC
    BrowserUk,
    I really like this heuristic approach. Take a best guess and then see if it can be refined to get a better score. I am thinking about posting a "Challenge: Letter Power Revisited" thread so I have been considering how to generate puzzles at random. I was able to find an example where your solution does not provide the best solution in just 3 tries:

    My heuristic solution is very similar to yours. Rather than picking the best individual score from each group, I calculate the letter frequency across all words. Then I select the word from each group that has the highest score based on that frequency analysis. I have also modified the refinement loop to limit the number of loops to handle the edge case where picking the best individual scores is the worst thing to do.

    LOOP: for (1 .. $max_loops) { # your refinement loop if ($better) { # update best next LOOP; } # end of your refinement loop last; }
    This way you pick a "good enough" score in a reasonable time frame no matter what.

    I am now going to try and figure out if I can do a smart exhaustive search to guarantee best solution in a reasonable runtime (less than 2 hours). If I can, I will post a new thread comparing naive brute force, optimized brute force, heuristic, optimized heuristic.

    Cheers - L~R

      It can be show that a better solution exists (not necessarily best):

      438 : [ eyesore sleekest steepen spindles baseless rockiest refiners sheered tethers exhausts ]

      Yep! That's the problem with heuristic solutions. Unless you also run a full brute force on the same data, you can never be sure that you haven't found some local maxima.

      On the dataset above, your 438 is better that my 434, but neither are as good as:

      441 : [ rendered spindles squires tethers rockiest eyesore sheered sleekest steepen refiners ]

      But even using all 4 cpus flat out, my brute force solution can just about manage 1/2 a million combinations/minute. Given there are 7.75e9 combinations to try, I'm projecting over 10 days to verify the heuristic for that one dataset. That's too much for that sake of an intellectual challenge with no useful purpose.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUk,
        That's the problem with heuristic solutions.

        Heuristic solutions can be improved. For instance, the following modification to your refinement algorithm shows an even better solution:

        It finds the following in under a second and finishes off the work queue in about 17.

        446 : [ odysseys sleekest steepen spindles baseless marshals pastels s +talest shambles exhausts ]

        I am still looking for a trick given a specific dictionary (60K words) to be able to produce a guaranteed optimal solution in a reasonable run time (see How many words does it take? for an example of such a trick). In the interim I have been exploring a number of heuristic solutions and refinement algorithms.

        Your heuristic approach with my refinement is always able to tie or beat the "best" solutions of the other heuristic approaches. Unfortunately, in the cases when there is a tie, the other approaches reach the optimal solution faster. I am trying to figure out if there is a way to prune the work queue above so that it produces the same results faster.

        If you are interested in the approaches I am exploring (either for heuristic or guaranteed optimal) let me know. I am seldom interesting in activities with "useful purposes" so don't feel obliged to indulge me if you aren't interested :-)

        Cheers - L~R