in reply to Re^5: Challenge: Letter Power
in thread Challenge: Letter Power

BrowserUk,
You're still missing the point.

I don't believe I am.

You can only know when to stop looking for an improvement...

For the sake of argument, let's distinguish between "initial best guess after preliminary analysis" and "refining that guess to find better solutions". Of course you are going to decide to stop looking for an improvement otherwise you might as well do an exhaustive search. The question is what types of "initial guessing" and "refinement approaches" lead to the "best guess" across varied data. If the refinement approach only limits changing the guess of 1 category at a time then of course there is a limit to improvements that can be made.

Without running an exhaustive search you have no way of judging how good any particular heuristic is for any given dataset.

I am not trying to judge how good a heuristic algorithm is against the optimal solution - just how good they compare to each other and how quickly they converge on the "best guess". I have already created a way to generate puzzles of large varied data and have a way to automate the scoring so it is easy to see how they behave.

And whilst your modifications find better solutions for this dataset, maybe my original will work better for a different dataset?

Hmm. That would be impossible. I just extended your original refinement algorithm. You iteratively went through and said after an initial guess, would the score improve if I change the guess for category X (looping in order). If yes, you started over. My modification doesn't start over - it just pushes that improvement on the work queue and says - if instead of modifying the guess to category X - would I improve my score if I changed category Y instead. In other words, it too checks improvements of changing just 1 guess at a time but unlike yours, it checks all possible improvements. I was just hoping there was a way to use prior knowledge (already using a %seen hash) to know that a particular path couldn't possibly lead to a score better than the current best.

And without an exhaustive search of a representative sample of datasets, you've no way to compare the efficacy of different heuristics.

Well, for now I am comparing them to each other and to be honest, they all perform very close to one another if they use my modified refinement algorithm. I am however thinking about a solution that will produce a guaranteed optimal solution without performing an exhaustive search by limiting the input to a specific set of conditions such as the dictionary used, length of words, number of categories, etc. It may be impossible but I learn from the exploration regardless and I have been successful in the past.

Feel free not to reply. I was serious about not needing to indulge me.

Cheers - L~R

Replies are listed 'Best First'.
Re^7: Challenge: Letter Power
by BrowserUk (Patriarch) on Apr 19, 2009 at 10:52 UTC

    Since you don't seem to be able to see beyond a pissing contest between my heuristic and my heuristic stuck in a loop, I'll try to make the point another way.

    All you are doing at the moment is comparing relative numbers for whichever heuristics you decide to test. And on a miniscule sample of possible datasets. But that gives you no basis upon which to draw any meaningful conclusions.

    • The best you can say is: "heuristic A finds a better solution that heuristic B--for this dataset"
    • Or: "heuristic A find the same solution as heuristic B more quickly--for this dataset."

    But in either case, the best solution you have found could be abysmally bad.

    So what is the point in being able to say that heuristic A converges on an abysmally bad solution 0.1 seconds more quickly than heuristic B?

    For a heuristic to be useful (or even just vaguely meaningful), you have to be able to say something to the effect of: "This heuristic will find the optimal (or one of the top ten most optimal) solution in 90% of cases".

    Until you have some way of determining that kind of baseline, comparing trivial modifications of a single heuristic, or two wildly different heuristics, serves little purpose. And you cannot form a baseline without either an exhaustive search or some sound mathematical way of determining or estimating the optimal solution.

    If you're really bent on pursuing an algorithm for solving the original problem, then lookup the literature on Linear Programming with particular attention to 'longest path' and 'critical path' analysis in a network. These have known solutions that will guarentee to find an optimal solution faster than a brute force search. Though probably not if coded in Perl.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      Since you don't seem to be able to see beyond a pissing contest between my heuristic and my heuristic stuck in a loop....

      I am honestly not trying to be thick headed but I do not see what pissing contest you are referring to and for the record, nothing is stuck in a loop. I saw a way to refine your algorithm and asked for your input if you were interested. You could have simply said that you weren't.

      I am sorry that you do not share my appreciation for exploring for exploration's sake but don't tell me there is no value in it. I have certainly learned a lot - something that may even be useful in the practical sense in the future.

      As I have alluded to throughout this thread, I am exploring both heuristic solutions and guaranteed optimal solutions. Since you have made it clear that the only value to you that a heuristic solution has is if you can compare it to the known optimal solution over a large enough data set - I wrote an extremely infantile C brute force implementation that is able to check 3,129,836.24 solutions per second on a single CPU without any compiler optimizations. In other words, it only took 41 minutes to prove the best solution for this data is 446. For me, 41 minutes is still too long so I am exploring other techniques to get under 20 minutes but you weren't interested in that either.

      Your solution, with my refinement, has found the optimal solution within 60 seconds for every random sampling I have been able to do today.

      I won't continue to waste your time since you aren't interested. If you want to see the C (or the perl code to generate it) then feel free to /msg me.

      Update: Adding compiler options gets the C brute force solution to complete all 7,758,767,016 possible solutions in 12 minutes and 24 seconds. I guess I don't need an ingenious solution for that part (10,417,514.7 solutions per second is fast enough for my purposes).

      Cheers - L~R

        You could have simply said that you weren't interested.

        I did. Both via /msg, and repeatedly in this thread. And I gave you my reasons why not.

        [the heuristic] has found the optimal solution within 60 seconds for every random sampling
        the C brute force solution to complete all 7,758,767,016 possible solutions in 12 minutes and 24 seconds.

        So what value has the heuristic solution?

        Nobody is going to have a need to solve these puzzles in such large numbers that waiting 15 minutes for the solution is a problem. So being able to say that this heuristic may give you a solution in 60 seconds isn't useful. For a heuristic to be useful there has to be some mathematical basis for believing it can be trusted, as with the Rabin modification of the Miller primality test.

        Had you once in your replies offline or on, indicated that you accepted the need for a reasonable performance brute force mechanism, then I might have been interested in that. As that, as a whole or in parts, could be useful for tackling other, more practical NP hard problems.

        For example, I've a C implementation of the combs() iterator (probably better named arrangements?) from my original post, that cycles through the 7.75e9 arrangements in 45 seconds. (Single threaded). And with suitable pre-encoding of the words, I'm reasonably confident that the scoring of those arrangements shouldn't much more that double that time. If I'm right, and the brute forcing can be achieved in under 2 minutes, what value a sub 60 second maybe?

        With threading, that might be shortened further, and that's something that's right up my alley.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.