in reply to Re^3: Challenge: Letter Power
in thread Challenge: Letter Power

BrowserUk,
That's the problem with heuristic solutions.

Heuristic solutions can be improved. For instance, the following modification to your refinement algorithm shows an even better solution:

## Explore all improvement paths from the current best guess ## Since multiple paths are explored simultaneously - only print overa +ll improvements ## Limit execution time to specified timeframe my %seen; eval { local $SIG{ALRM} = sub { die "Timed Out\n"; }; alarm 60; my @work = [$best, \@guess]; while (@work) { my $item = pop @work; my ($cur_best, $cur_guess) = @$item; for my $idx (0 .. $#cats) { for my $word (@{$cats[$idx]}) { my @new_guess = @$cur_guess; $new_guess[$idx] = $word; next if $seen{"@new_guess"}++; my $new_tot = calcScore(\@new_guess); if ($new_tot > $cur_best) { push @work, [$new_tot, \@new_guess]; if ($new_tot > $best) { $best = $new_tot; print "$best : [ @new_guess ]\n"; } } } } } alarm 0; };

It finds the following in under a second and finishes off the work queue in about 17.

446 : [ odysseys sleekest steepen spindles baseless marshals pastels s +talest shambles exhausts ]

I am still looking for a trick given a specific dictionary (60K words) to be able to produce a guaranteed optimal solution in a reasonable run time (see How many words does it take? for an example of such a trick). In the interim I have been exploring a number of heuristic solutions and refinement algorithms.

Your heuristic approach with my refinement is always able to tie or beat the "best" solutions of the other heuristic approaches. Unfortunately, in the cases when there is a tie, the other approaches reach the optimal solution faster. I am trying to figure out if there is a way to prune the work queue above so that it produces the same results faster.

If you are interested in the approaches I am exploring (either for heuristic or guaranteed optimal) let me know. I am seldom interesting in activities with "useful purposes" so don't feel obliged to indulge me if you aren't interested :-)

Cheers - L~R

Replies are listed 'Best First'.
Re^5: Challenge: Letter Power
by BrowserUk (Patriarch) on Apr 18, 2009 at 16:44 UTC
    Heuristic solutions can be improved.

    You're still missing the point. You can only know when to stop looking for an improvement (either through changes to the algorithm or further iterations), if you have some target to aim for. Maybe 446 is the best for this particular dataset. Or maybe, there is another 2 point gain to be had? Or a 20 point gain? Or 200? Maybe there are hundreds of better solutions to this particular dataset. Without running an exhaustive search you have no way of judging how good any particular heuristic is for any given dataset.

    And whilst your modifications find better solutions for this dataset, maybe my original will work better for a different dataset? Or many different datasets. And without an exhaustive search of a representative sample of datasets, you've no way to compare the efficacy of different heuristics.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      You're still missing the point.

      I don't believe I am.

      You can only know when to stop looking for an improvement...

      For the sake of argument, let's distinguish between "initial best guess after preliminary analysis" and "refining that guess to find better solutions". Of course you are going to decide to stop looking for an improvement otherwise you might as well do an exhaustive search. The question is what types of "initial guessing" and "refinement approaches" lead to the "best guess" across varied data. If the refinement approach only limits changing the guess of 1 category at a time then of course there is a limit to improvements that can be made.

      Without running an exhaustive search you have no way of judging how good any particular heuristic is for any given dataset.

      I am not trying to judge how good a heuristic algorithm is against the optimal solution - just how good they compare to each other and how quickly they converge on the "best guess". I have already created a way to generate puzzles of large varied data and have a way to automate the scoring so it is easy to see how they behave.

      And whilst your modifications find better solutions for this dataset, maybe my original will work better for a different dataset?

      Hmm. That would be impossible. I just extended your original refinement algorithm. You iteratively went through and said after an initial guess, would the score improve if I change the guess for category X (looping in order). If yes, you started over. My modification doesn't start over - it just pushes that improvement on the work queue and says - if instead of modifying the guess to category X - would I improve my score if I changed category Y instead. In other words, it too checks improvements of changing just 1 guess at a time but unlike yours, it checks all possible improvements. I was just hoping there was a way to use prior knowledge (already using a %seen hash) to know that a particular path couldn't possibly lead to a score better than the current best.

      And without an exhaustive search of a representative sample of datasets, you've no way to compare the efficacy of different heuristics.

      Well, for now I am comparing them to each other and to be honest, they all perform very close to one another if they use my modified refinement algorithm. I am however thinking about a solution that will produce a guaranteed optimal solution without performing an exhaustive search by limiting the input to a specific set of conditions such as the dictionary used, length of words, number of categories, etc. It may be impossible but I learn from the exploration regardless and I have been successful in the past.

      Feel free not to reply. I was serious about not needing to indulge me.

      Cheers - L~R

        Since you don't seem to be able to see beyond a pissing contest between my heuristic and my heuristic stuck in a loop, I'll try to make the point another way.

        All you are doing at the moment is comparing relative numbers for whichever heuristics you decide to test. And on a miniscule sample of possible datasets. But that gives you no basis upon which to draw any meaningful conclusions.

        • The best you can say is: "heuristic A finds a better solution that heuristic B--for this dataset"
        • Or: "heuristic A find the same solution as heuristic B more quickly--for this dataset."

        But in either case, the best solution you have found could be abysmally bad.

        So what is the point in being able to say that heuristic A converges on an abysmally bad solution 0.1 seconds more quickly than heuristic B?

        For a heuristic to be useful (or even just vaguely meaningful), you have to be able to say something to the effect of: "This heuristic will find the optimal (or one of the top ten most optimal) solution in 90% of cases".

        Until you have some way of determining that kind of baseline, comparing trivial modifications of a single heuristic, or two wildly different heuristics, serves little purpose. And you cannot form a baseline without either an exhaustive search or some sound mathematical way of determining or estimating the optimal solution.

        If you're really bent on pursuing an algorithm for solving the original problem, then lookup the literature on Linear Programming with particular attention to 'longest path' and 'critical path' analysis in a network. These have known solutions that will guarentee to find an optimal solution faster than a brute force search. Though probably not if coded in Perl.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.