comment on

All,
In Hangman Assistant, Lawliet posted an interesting problem. The unstated objective is to maximize the number of games won while minimizing the number of incorrect guesses. To summarize my participation in that thread, I found that with a comprehensive pruning algorithm, you could achieve great results (better than binary search) by choosing the letter that had the highest probability of being in the mystery word. Out of approximately 60K words, this approach won 96.28% of the time and only averaged 1.74 wrong guesses when winning. A nominal improvement can be made by breaking ties by looking at which letter has the highest overall density amongst candidate words (96.31% and 1.73 average wrong guesses).

I also postulated that it should be possible to do even better by taking into consideration other factors. I started to do this by examining just the words that the algorithm failed to win against. I am finding little time to work on this but since this may appeal to more people, I am presenting it as a challenge. Rather than starting from scratch, here is the data that you can assume has already been calculated for you to render your decision for "best letter to choose".

$num_of_incorrect_guesses_remaining;

$num_of_candidate_words;

%list_of_candidate_words;

$current_state_of_mystery_word; # 'four' may look like **u* meaning on
+ly u is known

%hash_of_letters_already_guessed;

# For each possible letter to choose from
# A hash with the following keys
    num_of_words_present_in
    total_count_of_letter_across_candidate_words
    probability_of_being_right
    probability_of_being_wrong
    num_of_candidates_remaining_if_wrong
    maximum_num_of_candidates_remaining_if_right

# Optionally for each possible letter to choose from
# A hash describing the count and position possibilities
# For instance:
    e => {
        '03' => 47,
        '1'   => 99
    },
# Means of the remaining candidate words, the letter 'e'
# Comes in two flavors, either only in the first position
# Or in positions 0 and 3.  So if e was chosen and correct
# Either 47 words would remain or 99 - max being 99
[download]

From this, you can easily duplicate my algorithm by determining which letter to choose based on which word appears in the most candidate words (highest probability of being right) and break ties by using the highest overall density. You should however be able to do so much more. There are a couple of main approaches that I can think of. First, a 1 size fits all algorithm that weighs various different pieces of data to produce an overall score. The advantage here is that you should be able to tune these weights without touching code to produce the best results for a given "dictionary". The second approach would be to have multiple algorithms already fine tuned for a given set of conditions and dispatch to that algorithm based on initial conditions. For instance, it may be ok to take a risk of guessing wrong if the remaining wrong guesses is still high or perhaps utilize the "simple" algorithm already presented unless the probabilities and candidates remaining (if right or wrong) is beyond some threshold.

Even though both of those approaches leave implementations wide open, don't assume those are the only two ideas to be had. Feel free to come up with whatever you think is best. Heck, you may even want to have a pre-process fine tuning algorithm based on the dictionary that allows weights and/or algorithms to be chosen by the data.

Cheers - L~R

In reply to Challenge: Design A Better Hangman Algorithm by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.