http://qs1969.pair.com?node_id=220552

I had an odd dream last night which I attribute to a combination of my recent viewing of the movie WarGames, and some others issues that I should discuss with a therapist.

In my dream I was trapped in a room in which the door would only open when the correct "code" word was spoken. I was provided a large book which was much like a dictionary, minus the definitions. So in my dream I began on page 1 of the book and began reading the words aloud but of course woke before "hitting" the correct one.

When I woke I of course wished I could have utilized Perl in my dream, which led to a more in depth analysis of the issue. I've since been contemplating what the best method would be to obtain the correct word in the shortest amount of time.

For example, if I start at the beginning of the list with the "A"'s and continue reading sequentially this method would work great if the "code word" was towards the beginning of the alphabet but would be horrible if the "code word" was "Zebra".

But are the overall odds the same of getting the correct word by randomly choosing from the list and by reading sequentially? Or would a "slope" of sorts that merges the two methods prove best?

For example, if you start from the beginning of the list and also the end, for example alternating A and Z words:

$guess_1... =~ /\AA/i; $guess_2... =~ /\AZ/i;
Where ultimately $guess_1 would be N+1 and $guess_2 would be N-1.

This method would provide the correct word fastest if it was located at the beginning or the end of the list. But what if the word was in the "M"'s?

So I thought about a function that returned a random word from the subset between the index's of $guess_1 and $guess_2:

$guess_1 < $guess_3 < $guess_2

So now we're left with a loop that try's $guess_1, $guess_2, $guess_3...$guess_1, $guess_2, $guess_3...

Any Math majors out there that can support || destroy my reasoning?

-Nitrox

Replies are listed 'Best First'.
Re: Dreams of Probability
by cjf-II (Monk) on Dec 17, 2002 at 16:11 UTC
Re: Dreams of Probability
by atcroft (Abbot) on Dec 17, 2002 at 16:14 UTC

    I had a similar discussion with a friend recently discussing security issues, so your post intrigued me.

    I am neither a math major nor a therapist, but given the criteria you specified (i.e., that the code word was chosen at random), it would seem to me that in either the case of a sequential or a random search, you average testing half of the words before finding an answer. The random search in that case would also involve more overhead in maintaining track of those words tested thus far. I would also suspect you would not gain much from alternating words, as the middle of the listing would still occur near the end of your search.

    The only possible improvement I could suggest, if any, would be that, if the length of time to try one word is proportional to its length (you did not say if this were the case, or if it was a constant time operation to test a word) to perhaps try ordering the list such that it is ordered first by word length, then alphabetically (e.g., qw(a i am an as at ax be bo...)), so as to progress through the simpler words first.

    Perhaps future dreams will provide you more clues to help in unlocking your mystery.

    Update: An explanation of my suggestion of ordering first by word length. If this door were meant to be opened, someone must remember the word. If they are security-minded, it will be a word they can recall easily, but should not be easily guessed by others knowing who has access. Recalling that the human mind (assumption: being with legal access is human) seems to work best with no more than 7 to 9 bits of information, it would seem a reasonable attack vector.

Re: Dreams of Probability
by adrianh (Chancellor) on Dec 17, 2002 at 15:58 UTC

    If the word is randomly chosen from the dictionary, then there isn't any general method that would get the solution faster than trying them all in order.

    For your particular dream - you might be able to make it easier if the lock isn't clever about words that have other words in them, e.g. if the password was "form" and you said "forming" would it unlock? If so you could go for the words that include other words in them to cut out more possibilities per-guess.

    Perl and a pronunciation dictionary might be able to help in this case :-)

      If the word is randomly chosen from the dictionary, then there isn't any general method that would get the solution faster than trying them all in order.

      Of course there is! Read the shortest words first where "short" refers to how quickly you can read the word aloud.

      Just be sure to not count the sorting time in the benchmark and don't let them find out about your strategy or they'll pick the longest word next time... unless you can let them know of your strategy in a way where they think you don't know that they know, in which case next time you'd be pretty sure they did pick the longest word... but they'd probably not pick the very longest word, just a really long word, so it'd be better if you could get them to think you thought they didn't know that you knew that they knew so they'd pick a really short word because you could go through short words faster since they are, well, short... unless there are a lot more short words in the dictionary than long words, ...

              - tye (well, you get the idea)
        Of course there is! Read the shortest words first where "short" refers to how quickly you can read the word aloud.

        You're right :-) I was assuming a constant time per authentication attempt.

Re: Dreams of Probability
by jynx (Priest) on Dec 17, 2002 at 21:21 UTC

    Some more metrics,

    Atcroft started to touch on this, but didn't actually extrapolate. The list of words can be shortened in more than one direction. The first is word length. No one in their right mind is going to go through the trouble of saying pneumonultramicroscopicsilicovolcanoconiosis or supercalifragilisticexpialidocious (i may have spelled one or both of those wrong) everytime they want to pass one door. Also words that are too short might be disallowed (like 'a', 'oh', 'it', etc...) so you can probably skip those as well.

    The second is the word's esoteric factor. Although this is largely gauged on whether or not they used a dictionary to pick the word or whether they just used a word from memory. That is to say, if they opened the dictionary to a random point and chose that word, then you cannot shorten the word list due to esoteric words being less inclined to be lodged in the average human's brain. But if it's chosen from memory...

    Personally i'm an esoteric sesquipedalian alien, and so using obscure or arcane wording would be fine, but most people haven't spent a lifetime trying to learn such strange words, so the word list could be shortened to those words within an epsilon of, say, the fourth grade reading level (the average reading level of most americans). If you put stock that these people are a bit more intelligent, up that list to 7th grade reading level. That's roughly the highest reading level in publication (IIRC). That's still far fewer words than the average dictionary.

    On the other hand, there is also the possibility that the word is not in the dictionary. Personally caddywhompus would be a fantastic word to use, especially since one will never find it in a dictionary (or at least, not one that i know of). According to your post this is not the case, so i won't attempt any thoughts toward this end :-)

    So other than those dictionary reductions, you'll still have to go through about half the dictionary that you end up with? Hmm, maybe not. First of all, because we always count numbers starting with 1, and because nice, round numbers appeal to the human intillect so much, if you look through a book and count the instance of each 0-9 digit in it, you'll come up with statistically more 1's than most other digits (i forget the actual spread; but 1 is very common).

    This is untested, but if the same works for the alphabet, than you'll have a greater chance of finding your word near the beginning of the alphabet. Or more accurately, you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters. The most common letter is e, but it doesn't start as many words as s or t, so you could start there. Like i said, it's not proven, but i'm hypothesizing that it would reduce the amount of time used to find the word.

    Just my $0.02,
    jynx

      Great idea! Your posting, as do many at the monestary, exceeded seventh grade reading level quite handily:
      Fog15.2difficult
      Grade Level13(Flesch-Kincaid)
      Flesch50.6
      Complex words10.5%
      I measured this with my Style and Spelling Checker in Perl. (I'll spare you the results of the spell check :-).

      If I had to attack the lock with perl, I would read the words from a book, using a hash to skip all the words I had read before.

      I can imagine it now: "Call me Ishmael." - click.

      It should work perfectly the first time! - toma

      Or more accurately, you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters. The most common letter is e, but it doesn't start as many words as s or t, so you could start there. Like i said, it's not proven, but i'm hypothesizing that it would reduce the amount of time used to find the word.

      I doubt it. If you consider the words to be equiprobable, then the probability for the code to lie in some subset of the dictionary is directly proportional to the size of the subset. So, the probability and the average time increase accordingly and the strategy is of no help.

      If the words were not equiprobable, instead, then the most advisable course of action would be choosing them ordered by probability (divided by a cost function if one wants to take length or other parameters into account).

      Cheers

      Antonio

      The blackness of the night echoes my soul - A. Tucket
      To expand on abell's excellent point in simpler words:
      you'll have a greater chance of finding the word starting with a letter that more commonly starts words than other letters.

      That won't reduce the length of your search, because the fact that it is more probable that a word starts with a certain letter means there are more words starting with that letter. So you are more likely to find the password in that subset, but to exhaust that subset you will have to make more guesses, so that you can make no fewer guesses regardless which way you choose to order them.

      In effect, any approach of changing the order of guesses can be mapped to linear scanning of entries in an accordingly shuffled list of words. Once you realize that it is easy to see that there is absolutely no way to guess the right word in less than (dictionary size / 2) attempts on average.

      Makeshifts last the longest.

Re: Dreams of Probability
by Mr. Muskrat (Canon) on Dec 17, 2002 at 16:08 UTC

    I had an odd dream last night which I attribute to a combination of my recent viewing of the movie WarGames, and some others issues that I should discuss with a therapist.

    Perhaps you could save yourself some money... Check out the Eliza modules at CPAN!

Re: Dreams of Probability
by Aristotle (Chancellor) on Dec 17, 2002 at 16:11 UTC

    As far as I can tell:

    If you repeatedly pick a word out of this dictionary at random, any method of guessing will require reading half the available words on average to succeed.

    As you noted, what your first attempt, ie reading from both the front and back simultaneously, effectively did is simply to reorder the dictionary such that words beginning with M will be at the end of the list. The more you randomize your reordering method, the harder the resulting list becomes to classify, but no matter how you turn it, if the word that was picked is toward the end of the shuffled list it will take many attempts to guess.

    Makeshifts last the longest.

Re: Dreams of Probability
by BrowserUk (Patriarch) on Dec 17, 2002 at 20:59 UTC

    Suggestion: Buy the book and read before going to bed.

    1. You might sleep better.
    2. If the dream becomes a recurring one, you might stand a chance of getting out before morning eventually as human beings can speak faster when recalling than when reading. Less IO. All you need is practice:^).

    Or you could try saying the letters of the alphabet one at a time and listening very carefully...you might here a click.


    Examine what is said, not who speaks.