in reply to Re^2: Challenge: Letter Power
in thread Challenge: Letter Power

It can be show that a better solution exists (not necessarily best):

438 : [ eyesore sleekest steepen spindles baseless rockiest refiners sheered tethers exhausts ]

Yep! That's the problem with heuristic solutions. Unless you also run a full brute force on the same data, you can never be sure that you haven't found some local maxima.

On the dataset above, your 438 is better that my 434, but neither are as good as:

441 : [ rendered spindles squires tethers rockiest eyesore sheered sleekest steepen refiners ]

But even using all 4 cpus flat out, my brute force solution can just about manage 1/2 a million combinations/minute. Given there are 7.75e9 combinations to try, I'm projecting over 10 days to verify the heuristic for that one dataset. That's too much for that sake of an intellectual challenge with no useful purpose.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: Challenge: Letter Power
by Limbic~Region (Chancellor) on Apr 18, 2009 at 14:18 UTC
    BrowserUk,
    That's the problem with heuristic solutions.

    Heuristic solutions can be improved. For instance, the following modification to your refinement algorithm shows an even better solution:

    It finds the following in under a second and finishes off the work queue in about 17.

    446 : [ odysseys sleekest steepen spindles baseless marshals pastels s +talest shambles exhausts ]

    I am still looking for a trick given a specific dictionary (60K words) to be able to produce a guaranteed optimal solution in a reasonable run time (see How many words does it take? for an example of such a trick). In the interim I have been exploring a number of heuristic solutions and refinement algorithms.

    Your heuristic approach with my refinement is always able to tie or beat the "best" solutions of the other heuristic approaches. Unfortunately, in the cases when there is a tie, the other approaches reach the optimal solution faster. I am trying to figure out if there is a way to prune the work queue above so that it produces the same results faster.

    If you are interested in the approaches I am exploring (either for heuristic or guaranteed optimal) let me know. I am seldom interesting in activities with "useful purposes" so don't feel obliged to indulge me if you aren't interested :-)

    Cheers - L~R

      Heuristic solutions can be improved.

      You're still missing the point. You can only know when to stop looking for an improvement (either through changes to the algorithm or further iterations), if you have some target to aim for. Maybe 446 is the best for this particular dataset. Or maybe, there is another 2 point gain to be had? Or a 20 point gain? Or 200? Maybe there are hundreds of better solutions to this particular dataset. Without running an exhaustive search you have no way of judging how good any particular heuristic is for any given dataset.

      And whilst your modifications find better solutions for this dataset, maybe my original will work better for a different dataset? Or many different datasets. And without an exhaustive search of a representative sample of datasets, you've no way to compare the efficacy of different heuristics.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUk,
        You're still missing the point.

        I don't believe I am.

        You can only know when to stop looking for an improvement...

        For the sake of argument, let's distinguish between "initial best guess after preliminary analysis" and "refining that guess to find better solutions". Of course you are going to decide to stop looking for an improvement otherwise you might as well do an exhaustive search. The question is what types of "initial guessing" and "refinement approaches" lead to the "best guess" across varied data. If the refinement approach only limits changing the guess of 1 category at a time then of course there is a limit to improvements that can be made.

        Without running an exhaustive search you have no way of judging how good any particular heuristic is for any given dataset.

        I am not trying to judge how good a heuristic algorithm is against the optimal solution - just how good they compare to each other and how quickly they converge on the "best guess". I have already created a way to generate puzzles of large varied data and have a way to automate the scoring so it is easy to see how they behave.

        And whilst your modifications find better solutions for this dataset, maybe my original will work better for a different dataset?

        Hmm. That would be impossible. I just extended your original refinement algorithm. You iteratively went through and said after an initial guess, would the score improve if I change the guess for category X (looping in order). If yes, you started over. My modification doesn't start over - it just pushes that improvement on the work queue and says - if instead of modifying the guess to category X - would I improve my score if I changed category Y instead. In other words, it too checks improvements of changing just 1 guess at a time but unlike yours, it checks all possible improvements. I was just hoping there was a way to use prior knowledge (already using a %seen hash) to know that a particular path couldn't possibly lead to a score better than the current best.

        And without an exhaustive search of a representative sample of datasets, you've no way to compare the efficacy of different heuristics.

        Well, for now I am comparing them to each other and to be honest, they all perform very close to one another if they use my modified refinement algorithm. I am however thinking about a solution that will produce a guaranteed optimal solution without performing an exhaustive search by limiting the input to a specific set of conditions such as the dictionary used, length of words, number of categories, etc. It may be impossible but I learn from the exploration regardless and I have been successful in the past.

        Feel free not to reply. I was serious about not needing to indulge me.

        Cheers - L~R