in reply to Re: Hangman Assistant
in thread Hangman Assistant

blokhead,
Some more thoughts:

Your strategy is effectively a binary search. Assuming a perfect distribution (it is always possible to find a letter that is in exactly half of the remaining words), you should be able to guess (on average) up to twice as many times as you are allowed wrong guesses before losing the game. Of course, the last guess still has a 50% chance of being wrong so I believe your strategy is guaranteed to work when the total number of initial candidates is 2^(2G - 1) or less, where G represents the number of allowed wrong guesses. For instance, when G = 7 then you are guaranteed (100%) to win when the initial number of candidates is 8192 or less and 50% when it is up to 16,384. For eclectic, there are 9638 initial words in my word list so only a 50% chance of success.

My strategy is optimized not to guess wrong. After only 5 guesses (2 right and 3 wrong) it had narrowed the search space down to exactly 1 word. This is because I prune by position not just by presence of letter so even successful guesses on a popular letter can still effectively decrease the search space. Your approach would be improved with this strategy as well. I don't think the opportunities stop there.

In a private /msg with Lawliet the idea of finding the solution with the least number of guesses was proposed. I indicated that my strategy would change - but to what? A binary search is already optimal. There needs to be some balance then between improving your odds from 50/50 of guessing wrong while still effectively reducing your search space each time. This way you can survive long enough to win but not guess ad nauseum.

I am kicking around some ideas where you still look for a very popular letter (say in 70% of the remaining words) but by position would still split that 70% in half if you guessed right. The result then would be that you would guess wrong 30% of the time and remove 70% of your search space or guess right 70% of the time but reduce your search space by 65%. Does this sound viable to you?

Cheers - L~R

Replies are listed 'Best First'.
Re^3: Hangman Assistant
by blokhead (Monsignor) on Jul 13, 2009 at 18:14 UTC
    That makes sense. I hadn't considered also taking into account the positions of the letters when you guess a correct letter. So in my example, when the word has a Q, and you guess U, you can still potentially get some useful information if many of the candidate words have U appearing in different places (i.e., you could distinguish QUEUE from QUEST). In this case, it would be "best" (if minimizing total # of guesses) to try to choose a letter whose absence / presence at all positions will partition the candidates into a large number of sets, each with size as small as possible.

    Update: expanding with an example: Suppose the word to be guessed matches S T _ _ _. Then suppose we are considering E for our next guess. All of the candidate words will then fall into one of these 8 classifications:

    S T _ _ _ (no E in the word) S T _ _ E (E in last position ONLY) S T _ E _ (etc..) S T _ E E S T E _ _ S T E _ E S T E E _ S T E E E
    So we have 8 buckets, and we put all of the candidate words into the appropriate bucket. Suppose the bucket with most words has n words in it. Then in the worst case, after guessing E, we will have n remaining candidates. So you can take n to be the worst-case score of guessing E. Now compute this score for every letter, and take the letter with lowest score.

    Note that there might be other ways to score each possible next-letter guess. Number of non-empty buckets comes to mind as an "average case" measure (to be maximized). Again, this is all assuming we're minimizing the total number of guesses. That way, all of the possible outcomes are (i.e., the guessed letter appears or doesn't appear in the word) are treated the same. To minimize the number of wrong guesses, you have to treat the "doesn't appear in the word" outcome differently and weight things in some better way.

    blokhead