in reply to Challenge: Optimal Bagels Strategy

Increasing the possible values of $v has a minimal effect on the computational complexity of this problem. For any given value of $v "colors" and $p colors per solution, this puzzle can always be solved with a maximum of ceiling($v/$p)-1+ $p! guesses. The total number of permutations possible for a particular value of $v and $p has no impact.

In the original bagel problem we have $p=3 and $v=10. Although there are 648 possible 3 digit permutations, according to this formula, the maximum number of guesses is never more than 9:

ceiling($v/$p)=ceiling(10/3)=4 $p! = 3*2*1 = 6 #so... ceiling($v/$p)-1 + $p! = 3 + 6 = 9

If we increase $v to its maximum proposed value, 36, we only add 8 more guesses ceiling($v/$p) only increases from 4 to 12. On the other hand if we increase $p to its maximum of 10, then the maximum number of guesses becomes ~ 10! or about 3.6 million.

ceiling($v/$p)=ceiling(36/10) = 4 $p! = 10! #so... ceiling($v/$p)-1 + p! = 3 + 10!

My gut sense is that this puzzle is NP complete within solution space defined by ceiling($v/$p)-1+p!. That is, we have indeed exhausted all the information available if we following the guessing strategy that led to that formula. You can come up with fewer guesses than that by sheer chance and a smart use of what chance gives you. For example, clever use of information from an initial guess whose response is $p-2 fermi and 2 pico can be solved much more quickly than a puzzle whose initial $p guesses are 1 pico each. If the luck of the draw gives you such an initial set of guesses, you can't use any sort of logic to ensure that your number of guesses will be less than ceiling($v/$p)-1+p!.

Of course, all of this assumes that the maximum number of guesses really is ceiling($v/$p)-1+ $p!. The formula is based on the algorithm described below. I haven't converted it into code because I wanted to focus on explaining the logic, but it wouldn't be too difficult to do so. I'm often prone to overlook things when I do this kind of mathematical reasoning so a second pair of eyes for my argument would be much appreciated.

The formula presumes a solution strategy with the following steps:

  1. order the set of values
  2. partition the values into sets each containing $p consecutive values. For the original bagel problems this would result in four sets: 0 123 456 789
  3. determine the number of solution values in each partition
  4. order the partitions based on the number of solution values in the partition.
  5. for each partition in turn, starting with the partition containing the most solution values, make guesses to determine which particular values from that partition are in the solution.
  6. make additional guesses to determine position, if needed
  7. If we have N possible guesses and have made N-1 then we know the answer to the Nth guess without actually making it. When that happens, we need to make one more final "guess" with the known solution.

Determining number of digits in each partition

If we have $p non-repeating values in the solution, and a range of $v values, then the number of partitions will be $n=$v/$p (integral division) if $v is evenly divisible by $p and $n+1 if not. More succintly, the number of partitions will be ceiling($v/$p). For the original bagel problem this means we never have more than 4 partitions: 0 123 456 789.

If the total number of partitions is N, then we never need more than N-1 (ceiling($v/$p)-1) guesses to determine how the values are allocated amongst the partitions. Any values that aren't in the first N-1 partitions must be in the Nth partition, so there is no point in making a guess about that partition. This means that we will make $n-1 guesses if $v is evenly divisible by $p and $n guesses if not. For the original bagel problem we need to make 3 guesses (4-1).

Determining the exact digits in the solution.

The number of guesses needed to find the exact set of values in the solution (as opposed to their allocation amongst the partitions) depends on the way values are allocated among the partitions.

All values in a single partition.For a partition with $p solution values, no guesses are needed because each and every solution value is in the partition.

Values in two partitions: ($p-1, 1). If the values are split between two partitions, (I) with $p-1 values and (II) with just 1 solution value, our maximum number of guesses to identify the solution digits will be 2*$p-3.

To make these guesses we take a random-maybe-bagel from partition II. We can use the pattern of answers from our first two guesses to determine whether we chose a bagel or not. They may also tell us which value in partition I is the bagel:

If it turns out that our first two guesses replaced non-bagels we will need to keep guessing by replacing each value in I with the value from II. We know from the first two guesses just how many pico/fermis there will be when we replace the bagel. By our $p-1 the guess we will know with a certainty where the bagel is. If none of our guesses so far have replaced the bagel, then the final value in the partition must be the bagel of partition I. Thus figuring out which values in partition I are in the solution never takes more than $p-1 guesses.

If our value from partition II was a non-bagel then we're done since we also know the identity of the non-bagel in partition II. However, if we chose a bagel from partition II, then the solution value in II is in the remaining $p-1 value of II. To find which of these is a solution value we replace each value in turn with one of the non-bagels from I. When we get an answer with 2 pico/fermi instead of just 1, we know we've found the solution value. If our first $p-2 guesses always get just 1 pico/fermi, then we know the final remaining value in II is the solution value. Thus figuring out which value in partition II is in the solution takes only $p-2 guesses. In total, when the values are split between two partitions we never need more than 2*$p-3 guesses to determine which values are in the solution.

Values in three partitions: ($p-2, 1, 1). The maximum number of guesses is (2*$p-3)+($p-2) = 3*$p-6. As just discussed we make $p-1 guesses for partition I and $p-2 guesses for partition II. When making our guesses for partition II we use a random-maybe-bagel from partition III. Our first two guesses on partition II will tell us whether or not this random-maybe-bagel is an actual bagel. Therefore when we start making guesses on partition III we only need to make $p-2 guesses.

By now a pattern should be obvious: if our values are split between N partitions such that one partition has ($p-N+1) solution values and the remaining have 1 solution value each, then our total guesses are ($p-1)+(N-1)($p-2)=N*($p-2)+1 whenever N>=2. This value grows as N grows. The maximum number of partitions can never be more than $p, so maximum number of guesses, whatever the arrangement of partitions will always be $p*($p-2)+1

There are of course other arrangements of solution values among the partitions. For example, I might have $p-2 and II might have 2 values. But these arrangements don't impact the maximum number of guesses. For instance, if partition I has 2 bagels we still know which two values are bagels after $p-1 guesses and we can still play the trick of using a random-maybe-bagel to reduce the number of guesses needed for each subsequent partition to $p-2.

Determining the order of digits

To complete the puzzle we need to find the order of values as well as their positions. The total number of guesses for a solution with $p values can never be more than $p! However, the actual number will be much less because each guess during the process of identifying partitions and solution values within partitions tells us something about the position of values as well. When we were trying to figure out the identity of the solution values we only cared about the total number of non bagels and ignored whether they were pico/fermi. Rearranging the values on each guess isn't going to turn a pico/fermi into a bagel, so it will have no effect on our reasoning when trying to identify which values are in the solution.

If we carefully rearrange the digits for each guess, we can insure that each of Q guesses to figure out the identity of the solution digits eliminates one of the $p! possible orderings. If it takes at most Q guesses to figure out the digits then we need to make at most $p!-Q additional guesses to eliminate all remaining positions. Thus, once we know which partitions contain solutions, we never need to make more than the maximum of $p! or Q guesses. Since Q is never more than $p*($p-2)+1, the total number of guesses to figure out both position and identity of the solution digits never exceeds max($p!, $p*($p-2)+1). However, $p! is always greater than $p*($p-2)+1, so the maximum number of guesses is actually just $p!.

Putting it all together

Finding the allocation of solution values among the partitions never takes more than ceiling($v/$p) guesses. Once we know the allocation of solution values, we need a maximum of $p! guesses. Thus the formula for calculating the maximum number of guesses is: ceiling($v/$p) - 1 + $p!

Best, beth

Replies are listed 'Best First'.
Re^2: Challenge: Optimal Bagels Strategy
by ELISHEVA (Prior) on Sep 30, 2009 at 16:50 UTC

    Upon further reflection I'm no longer convinced that this is an NP hard problem, but this may be because I'm a bit fuzzy on the nuances of the definition of NP hard.

    Although the worst case is always proportional to $p!, this problem still isn't as "hard" as a problem where we simply got the number of non-bagels and "yes/no" about whether they were in the right order. If we just got "yes/no" then once we knew all the digits, every additional guess would at most eliminate one of $p! possible permutations. However, in our case we can sometimes know how many are in the right position. Each time we get lucky and make a guess with one additional fermi, we may be able to reduce the number of potential remaining solutions by more than just 1.

    For example, suppose the problem selector chooses a solution that will yield $p-$x fermi and $x pico, on the first guess. The number of follow on guesses needed will always be ($p-1) + ($x-1)!. We need ($p-1) to find the identity of out of order elements and at most ($x-1)! guesses to put the elements in the correct position. We can identify the $x out of place digits by taking a known bagel and replacing each digit in turn. Any guess whose pico count drops by 1 will be replacing an out of order digit. After $p-1 such guesses we will know the identity of all out of place values. To put them in the right order, we need to potentially try all possible orderings where none of the $x elements are in their original position. The first of $x elements may be in ($x-1) locations, the second in any of the ($x-2) remaining and so on. Thus we need to consider ($x-1)! arrangements.

    However, we only need to make ($x-1)! guesses to put things in the right order if the number of pico is $x for all but the final "guess" (the guess we make when we know the answer but have never actually guessed that particular permutation). If after Q guesses we get $y additional fermi, then instead of guessing each remaining ($x-1)! arrangement, the number of remaining guesses is the minimum of ($x-1)! - Q and ($x-1)+($x-$y-1)!.

    However, as I stated earlier, we only have the possibility of making a more informative guess. If "fermis" come late enough in the guessing process we may not be able to use them to reduce the total number of guesses to less than ceiling($v/$p) - 1 + $p!. Some clarification by someone with a firmer understanding of what exactly is meant by NP-hard would be really appreciated so I can use these terms more appropriately.

    Best, beth

      If you are an ACM member, a pretty good description can be found at the beginning of The status of the P versus NP problem, an article in the September 2009 issue of Communications.

      Update: It looks like a free web account (yeah, I know, NYT model) will allow reading of this article. It also looks like the digital edition may allow reading without registration.

      --MidLifeXis

      Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

        Thanks a lot for the link! I really enjoyed the article. I think what was confusing me was the nature of "hardness" - is it measured by the worst case or the average case? If I understand the article, P and NP refer to the worst case. But many problems present a very different picture if we consider the average case complexity rather than the worst case (see middle of the article).

        Knowing how many are in the right position rather than merely a binary "all in position/one or more out of position" reduces the average complexity, but not the worst-case complexity. If we consider the worst case complexity, both problems (guesses are answered with number in the correct position vs. binary all/one or more) are equally hard and fulfill the definition for NP, but apparently not P ("apparently" because P != NP has not yet been formally proven). It satisfies NP because if we had some sort of miraculous machine that could try all possible guesses at once (i.e. a non-deterministic Turing machine) then we could solve this problem in polynomial time. Verification would be the only step left and to verify a solution we need one equality operation per value, essential O($p). However, it appears to fail P because if we are restricted to a deterministic Turing machine (which can only try possible solutions sequentially), we can't solve this in polynomial time because we have to search the entirety of a solution space that scales according to $p! rather than $p^n for some fixed n. Do I have this right?

        Best, beth