All,
I just came across a game called Bagels. Even though there are 648 possible starting numbers with those rules, you can come up with a strategy that will determine the solution without too many guesses. Having worked out such a strategy, I wondered if it was optimal. It is Friday after lunch and my brain has checked out for the weekend so I figured I would ask you monks and have a little fun in the process.

That means I am changing the rules.

So your challenge is to devise a strategy that, given the input conditions (number of positions and maximum possible value), produces a solution with the optimal number of guesses. Here is an example:

perl bagel_solver.pl p=4 v=14 s=1ACE
p is for the number of positions. v is for the maximum value 14 = 0-9, A, B, C, D, E. s is for the solution. The program will obviously play alone and not interactively.

Given the new rules, I am not sure if my strategy will still work so I post some time this weekend after I have had a chance to think about it. Until then, enjoy those bagels.

See also Mastermind.

Update: In the CB, MidLifeXis has pointed out that I have left a lot to be desired in how to choose a winner (scoring solutions) as well as restricting the input to command line arguments making a test harness prohibitive. I will be offline for much of the weekend and MidLifeXis has graciously agreed to take over explaining how those issues will be resolved :-)

Cheers - L~R

Replies are listed 'Best First'.
Re: Challenge: Optimal Bagels Strategy
by MidLifeXis (Monsignor) on Sep 25, 2009 at 18:23 UTC

    Last update: 2009/09/26 06:58 GMT -0500 (American/Chicago)

    Sure, open my mouth, and get volunteered :-)

    Corrections and Interpretations

    1. Restrictions on the number of colors to v >= p + 2

      I need to mull this one over and try it myself to see where the difficulty lies. If someone has a link to a paper that demonstrates why this is a difficult case, I would appreciate it. Due to the "no repeated colors" restriction, it is an unstated requirement in Limbic~Region's description that v >= p.

    2. Definition of v

      I am going to assume that Limbic~Region made a typo, unless it can be shown otherwise.

    3. v will define the alphabet as the first v characters of ( 0 .. 9, A .. Z ).

    So, the restrictions on p and v are as follows:

    • 1 <= p <= 10
    • p <= v <= 36

    You may assume that only values within these ranges will be given to your solution.

    Scoring

    To enable automated scoring, I propose that each solution be provided as a function accepting 3 parameters: p, v, and f (for example, solution(p, v, sub {check_solution(@_)}). The algorithm should return the solution. check_solution() (see below) will keep track of the number of tries and to verify that the algorithm actually found a solution. If the last check was not the correct solution when the algorithm terminates, the algorithm will not be successful.

    A function will be provided as the third parameter: check_solution(guess). check_solution will return a two element list: (picocount {number of correct, but wrong location}, fermicount {number of correct, and correct location}), or if you are familiar with bulls and cows, (cowcount, bullcount).

    Proposed scoring: open to suggestions. How should we handle possibly different results based on initial choice of the solution by the algorithm? How should we score invalid values submitted to check_solution, for example, 'AABB', which has repeated colors.

    My proposal: the same series of N challenges will be presented to each solution. The set of challenges may be hand selected, random, or some combination of both. Solutions will be scored based on number of correct solutions, then count of solutions where the minimum number of guesses was generated for that puzzle, then total guesses for all correct solutions, then speed (Ugh, this needs work. I would like to get the intent of LR's original challenge - finding the algorithm that provides the shortest path to every solution). If the check function receives an illegal solution, that challenge will be marked as incorrect. Time will be allocated while the solution is running, but not while it is in the provided check function (why should you be penalized when the check function could be inconsistently timed).

    Execution

    Your solution should be a sub as specified above. It will be run in a harness like the following:

    # I will write a working one of these for testing, # unless someone beats me to it. For testing purposes, # it should just need to record the number of times it # is called, and calculate pico and fermi. sub check_solution { # Update statistics # ... # Determine $pico and $fermi here # ... return ($picocount, $fermicount); } # Setup record keeping # ... # Choose number to pose to your_solution # ... # Launch individual solution $solution = your_solution( $p, $v, sub { check_solution(@_) } ); # Check statistics and record # ...

    Notes

    Note: I will be periodically updating this node. Please note the last update tag. Time is GMT -0500.

    Note 2: Runtime for p = 10, v = 36 may be prohibitively large for some solutions. >:-)

    --MidLifeXis

    Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

      I have two questions.

      Firstly, could you restrict the number of colors to at least two more than the number of positions? (The case when the number of colors is equal to the number of positions can be a bit tricky, so such a restriction would simplify the code we write while still allowing a contest.) Alternately, could you just choose two or three pairs of parameters to which we should optimize our solutions for the contest?

      Secondly, could you clarify whether the v parameter is the number of colors, or one less than the number of colors as Limbic's post seems to imply?

      Thanks in advance.

      Solutions will be scored based on ... count of solutions where the minimum number of guesses was generated for that puzzle,

      You're going to determine the best possible solution for all ~ 382 duodecillion possible puzzles?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        The previous sentence says “the same series of N challenges will be presented to each solution. The set of challenges may be hand selected, random, or some combination of both.” Obviously the next sentence refers to the number of solutions among those. Also, MidLifeXis will use his probability-fu to make sure the result is quite probably a good estimate of what you'd get from testing all solutions.

Re: Challenge: Optimal Bagels Strategy (Solutions)
by MidLifeXis (Monsignor) on Sep 26, 2009 at 13:23 UTC

    Please post solutions under here. I have included a framework....

    I have a harness here as well that you can use to test.

    --MidLifeXis

    Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

      Brute force solution. I wanted to have it near the top (along with my random solution) to show how an inefficient, but obvious solution would work. I would hope that solutions down the line would have much better performance and more creativity. :-)

      Note: I have seen someone (cannot remember who) calling Knuth's algorithm for mastermind a brute force algorithm. If it is the one I am thinking of, it is not a brute force algorithm, but (1) trims impossible solutions, and (2) orders guesses by highest potential of helping do (1) on the next pass.

      --MidLifeXis

      Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

Re: Challenge: Optimal Bagels Strategy
by JavaFan (Canon) on Sep 28, 2009 at 11:24 UTC
    I seem to recall that Knuth has written about this subject (Mastermind, but that seems essentially to be the same game, except that mastermind allows for repeats). IIRC, he has proven that (at least for a certain number of colours that positions), the best play after the first move is the one (or one of the ones) that minimizes the maximum number of possibles left, over all possible "pico/fermi/bagel" responses. I cannot remember what he said about first moves.

      Not having repeats makes the job of the person with the code much simpler, at least I think it does. The work that the person guessing the code has to do would then be based on the size of V compared to P.

      If you call each "color" Cn, and take the first P characters from a randomly sorted alphabet of size V, you essentially have a code of ...

      C1, C2, C3, ..., Cp

      If my grey matter is working properly, it does not matter what characters the code maker chooses, other than as a psychological exercise against the choice of the initial selections.

      If I ignore defending against the choice of initial selections, the code maker really only has 315 puzzles to generate.

      (p=1, v=1 ) .. (p=1, v=36) => 36 (p=2, v=2 ) .. (p=2, v=36) => 35 ... (p=10, v=10) .. (p=10, v=36) => 27 ((27 + 36) / 2) * 10 = 315

      Now, that is not to say that the code maker will probably not be choosing '0', '01', '012', etc as the codes. It also does not mean that there will not be some carefully chosen sequences designed to test the robustness of certain types of implementations. :-)

      --MidLifeXis

      Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

      JavaFan,
      Years after this post, I discovered that my daughter is now playing a simplified version of this game in her 2nd grade classroom. I have yet to find a more optimal strategy then Knuth's although the game isn't exactly the same.

      Cheers - L~R

Re: Challenge: Optimal Bagels Strategy
by ELISHEVA (Prior) on Sep 30, 2009 at 11:50 UTC

    Increasing the possible values of $v has a minimal effect on the computational complexity of this problem. For any given value of $v "colors" and $p colors per solution, this puzzle can always be solved with a maximum of ceiling($v/$p)-1+ $p! guesses. The total number of permutations possible for a particular value of $v and $p has no impact.

    In the original bagel problem we have $p=3 and $v=10. Although there are 648 possible 3 digit permutations, according to this formula, the maximum number of guesses is never more than 9:

    ceiling($v/$p)=ceiling(10/3)=4 $p! = 3*2*1 = 6 #so... ceiling($v/$p)-1 + $p! = 3 + 6 = 9

    If we increase $v to its maximum proposed value, 36, we only add 8 more guesses ceiling($v/$p) only increases from 4 to 12. On the other hand if we increase $p to its maximum of 10, then the maximum number of guesses becomes ~ 10! or about 3.6 million.

    ceiling($v/$p)=ceiling(36/10) = 4 $p! = 10! #so... ceiling($v/$p)-1 + p! = 3 + 10!

    My gut sense is that this puzzle is NP complete within solution space defined by ceiling($v/$p)-1+p!. That is, we have indeed exhausted all the information available if we following the guessing strategy that led to that formula. You can come up with fewer guesses than that by sheer chance and a smart use of what chance gives you. For example, clever use of information from an initial guess whose response is $p-2 fermi and 2 pico can be solved much more quickly than a puzzle whose initial $p guesses are 1 pico each. If the luck of the draw gives you such an initial set of guesses, you can't use any sort of logic to ensure that your number of guesses will be less than ceiling($v/$p)-1+p!.

    Of course, all of this assumes that the maximum number of guesses really is ceiling($v/$p)-1+ $p!. The formula is based on the algorithm described below. I haven't converted it into code because I wanted to focus on explaining the logic, but it wouldn't be too difficult to do so. I'm often prone to overlook things when I do this kind of mathematical reasoning so a second pair of eyes for my argument would be much appreciated.

    The formula presumes a solution strategy with the following steps:

    1. order the set of values
    2. partition the values into sets each containing $p consecutive values. For the original bagel problems this would result in four sets: 0 123 456 789
    3. determine the number of solution values in each partition
    4. order the partitions based on the number of solution values in the partition.
    5. for each partition in turn, starting with the partition containing the most solution values, make guesses to determine which particular values from that partition are in the solution.
    6. make additional guesses to determine position, if needed
    7. If we have N possible guesses and have made N-1 then we know the answer to the Nth guess without actually making it. When that happens, we need to make one more final "guess" with the known solution.

    Determining number of digits in each partition

    If we have $p non-repeating values in the solution, and a range of $v values, then the number of partitions will be $n=$v/$p (integral division) if $v is evenly divisible by $p and $n+1 if not. More succintly, the number of partitions will be ceiling($v/$p). For the original bagel problem this means we never have more than 4 partitions: 0 123 456 789.

    If the total number of partitions is N, then we never need more than N-1 (ceiling($v/$p)-1) guesses to determine how the values are allocated amongst the partitions. Any values that aren't in the first N-1 partitions must be in the Nth partition, so there is no point in making a guess about that partition. This means that we will make $n-1 guesses if $v is evenly divisible by $p and $n guesses if not. For the original bagel problem we need to make 3 guesses (4-1).

    Determining the exact digits in the solution.

    The number of guesses needed to find the exact set of values in the solution (as opposed to their allocation amongst the partitions) depends on the way values are allocated among the partitions.

    All values in a single partition.For a partition with $p solution values, no guesses are needed because each and every solution value is in the partition.

    Values in two partitions: ($p-1, 1). If the values are split between two partitions, (I) with $p-1 values and (II) with just 1 solution value, our maximum number of guesses to identify the solution digits will be 2*$p-3.

    To make these guesses we take a random-maybe-bagel from partition II. We can use the pattern of answers from our first two guesses to determine whether we chose a bagel or not. They may also tell us which value in partition I is the bagel:

    • ($p, $p-1) - the value we selected from II is a non-bagel. Our first guess replaced the bagel in I with a non-bagel. Our second guess replaced a non-bagel with a non-bagel.
    • ($p-1, $p) - same as before except that our second guess was the one that replaced the bagel.
    • ($p-1, $p-1) - the value from II is a non-bagel and neither guess replaced the bagel
    • ($p-1, $p-2) - same as ($p, $p-1) except that the value we chose from II was a bagel. Replacing a bagel with a bagel leaves the total number of pico/fermi unchanged, hence we get $p-1. Replacing a non-bagel with a bagel, reduces the total pico/fermi count to $p-2.
    • ($p-2, $p-1) - we selected a bagel and our first guess replaced a non-bagel and our second replaced a bagel.
    • ($p-2, $p-2) - both guesses replaced a non-bagel.

    If it turns out that our first two guesses replaced non-bagels we will need to keep guessing by replacing each value in I with the value from II. We know from the first two guesses just how many pico/fermis there will be when we replace the bagel. By our $p-1 the guess we will know with a certainty where the bagel is. If none of our guesses so far have replaced the bagel, then the final value in the partition must be the bagel of partition I. Thus figuring out which values in partition I are in the solution never takes more than $p-1 guesses.

    If our value from partition II was a non-bagel then we're done since we also know the identity of the non-bagel in partition II. However, if we chose a bagel from partition II, then the solution value in II is in the remaining $p-1 value of II. To find which of these is a solution value we replace each value in turn with one of the non-bagels from I. When we get an answer with 2 pico/fermi instead of just 1, we know we've found the solution value. If our first $p-2 guesses always get just 1 pico/fermi, then we know the final remaining value in II is the solution value. Thus figuring out which value in partition II is in the solution takes only $p-2 guesses. In total, when the values are split between two partitions we never need more than 2*$p-3 guesses to determine which values are in the solution.

    Values in three partitions: ($p-2, 1, 1). The maximum number of guesses is (2*$p-3)+($p-2) = 3*$p-6. As just discussed we make $p-1 guesses for partition I and $p-2 guesses for partition II. When making our guesses for partition II we use a random-maybe-bagel from partition III. Our first two guesses on partition II will tell us whether or not this random-maybe-bagel is an actual bagel. Therefore when we start making guesses on partition III we only need to make $p-2 guesses.

    By now a pattern should be obvious: if our values are split between N partitions such that one partition has ($p-N+1) solution values and the remaining have 1 solution value each, then our total guesses are ($p-1)+(N-1)($p-2)=N*($p-2)+1 whenever N>=2. This value grows as N grows. The maximum number of partitions can never be more than $p, so maximum number of guesses, whatever the arrangement of partitions will always be $p*($p-2)+1

    There are of course other arrangements of solution values among the partitions. For example, I might have $p-2 and II might have 2 values. But these arrangements don't impact the maximum number of guesses. For instance, if partition I has 2 bagels we still know which two values are bagels after $p-1 guesses and we can still play the trick of using a random-maybe-bagel to reduce the number of guesses needed for each subsequent partition to $p-2.

    Determining the order of digits

    To complete the puzzle we need to find the order of values as well as their positions. The total number of guesses for a solution with $p values can never be more than $p! However, the actual number will be much less because each guess during the process of identifying partitions and solution values within partitions tells us something about the position of values as well. When we were trying to figure out the identity of the solution values we only cared about the total number of non bagels and ignored whether they were pico/fermi. Rearranging the values on each guess isn't going to turn a pico/fermi into a bagel, so it will have no effect on our reasoning when trying to identify which values are in the solution.

    If we carefully rearrange the digits for each guess, we can insure that each of Q guesses to figure out the identity of the solution digits eliminates one of the $p! possible orderings. If it takes at most Q guesses to figure out the digits then we need to make at most $p!-Q additional guesses to eliminate all remaining positions. Thus, once we know which partitions contain solutions, we never need to make more than the maximum of $p! or Q guesses. Since Q is never more than $p*($p-2)+1, the total number of guesses to figure out both position and identity of the solution digits never exceeds max($p!, $p*($p-2)+1). However, $p! is always greater than $p*($p-2)+1, so the maximum number of guesses is actually just $p!.

    Putting it all together

    Finding the allocation of solution values among the partitions never takes more than ceiling($v/$p) guesses. Once we know the allocation of solution values, we need a maximum of $p! guesses. Thus the formula for calculating the maximum number of guesses is: ceiling($v/$p) - 1 + $p!

    Best, beth

      Upon further reflection I'm no longer convinced that this is an NP hard problem, but this may be because I'm a bit fuzzy on the nuances of the definition of NP hard.

      Although the worst case is always proportional to $p!, this problem still isn't as "hard" as a problem where we simply got the number of non-bagels and "yes/no" about whether they were in the right order. If we just got "yes/no" then once we knew all the digits, every additional guess would at most eliminate one of $p! possible permutations. However, in our case we can sometimes know how many are in the right position. Each time we get lucky and make a guess with one additional fermi, we may be able to reduce the number of potential remaining solutions by more than just 1.

      For example, suppose the problem selector chooses a solution that will yield $p-$x fermi and $x pico, on the first guess. The number of follow on guesses needed will always be ($p-1) + ($x-1)!. We need ($p-1) to find the identity of out of order elements and at most ($x-1)! guesses to put the elements in the correct position. We can identify the $x out of place digits by taking a known bagel and replacing each digit in turn. Any guess whose pico count drops by 1 will be replacing an out of order digit. After $p-1 such guesses we will know the identity of all out of place values. To put them in the right order, we need to potentially try all possible orderings where none of the $x elements are in their original position. The first of $x elements may be in ($x-1) locations, the second in any of the ($x-2) remaining and so on. Thus we need to consider ($x-1)! arrangements.

      However, we only need to make ($x-1)! guesses to put things in the right order if the number of pico is $x for all but the final "guess" (the guess we make when we know the answer but have never actually guessed that particular permutation). If after Q guesses we get $y additional fermi, then instead of guessing each remaining ($x-1)! arrangement, the number of remaining guesses is the minimum of ($x-1)! - Q and ($x-1)+($x-$y-1)!.

      However, as I stated earlier, we only have the possibility of making a more informative guess. If "fermis" come late enough in the guessing process we may not be able to use them to reduce the total number of guesses to less than ceiling($v/$p) - 1 + $p!. Some clarification by someone with a firmer understanding of what exactly is meant by NP-hard would be really appreciated so I can use these terms more appropriately.

      Best, beth

        If you are an ACM member, a pretty good description can be found at the beginning of The status of the P versus NP problem, an article in the September 2009 issue of Communications.

        Update: It looks like a free web account (yeah, I know, NYT model) will allow reading of this article. It also looks like the digital edition may allow reading without registration.

        --MidLifeXis

        Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.