Seven by seven farming puzzle

In 2007, I read the following task on an online puzzle site (exercise 5 here).

Cut a 6 times 6 chessboard to 18 black and 18 white squares. Assemble them to a 6 times 6 board in such a way that each row and column has 3 black and 3 white squares, every row is different, and every column is different. How many boards can you get this way? The rows and columns of the board are numbered, so boards that differ in rotation or flipping count as different.

As discussed on the J wiki, this puzzle is quite easy to solve with brute-force.

In 2008, we decided to post a larger variant of this puzzle on the first round of the Challenge24 contest. The (contest webpage has the full problem set downloadable, this is problem F.) The task was basically the following.

A seven by seven cell table has to be filled in such a way that each row and each column must have exactly two cells with a 0, two cells with a 1, and three cells with a 2 in them; no two rows can be exactly the same; and no two columns can be exactly the same. Given the contents of the first two rows, compute the number of ways the rest of the table can be filled.

With current computers, it's almost impossible to brute force this task by enumerating all possible solutions.

There is one simple observation that cuts down quite some time though.

The permutation of the bottom 5 rows doesn't matter, so it's enough to iterate on those possibilities where the bottom rows are in lexicographic order, and multiply the result you get with 120 (which is 5 factorial).

However, brute forcing takes too much time even after knowing this trick. The first working perl program I wrote takes about twenty minutes on one input. Such a long running time would be unacceptable, as we want to give ten sample inputs, and the contestants would have five hours for the whole contest (reading the tasks, development, running the program, submitting the answer, all this for all eight tasks, with only three people in a team). I didn't despair though, for I already knew that that solution can be optimized a lot.

My final solution is more complicated and much faster: it runs on any one input in at most six seconds (that was back in 2008, computers are twice as fast now). That's fast enough, so we could post this problem. (In fact I was quite surprised on how much faster the program became.)

The remainder of this post explains how my solution works, and shows the code too. I've spoilered this in case you want to try to write a solution yourself.

First, let's discuss what technique this first program uses, for it's not a trivial implementation, and the final, fast program is a development of that.

I'm representing each row as a single integer fingerprint whose seven base 100 digits are 10, 1 or 0 depending on the contents of the cell. This way, you can tell if each column has exactly two 0, two 1 and three 2 values by adding the seven fingerprints representing the rows and see if you get 2222_222222_2222 as the result. I'd of course precompute all possible rows (all permutations of two 0, two 1, and three 2), and then the program only has to iterate on all combinations of five different of these, check if the columns are right and if there are no two identical columns. (We only need to check each combination, not each variation, per the note above.)

The next version of the program runs about four times faster. Instead of trying all possible rows for the last row, I compute the only numbers to make the distribution of each column right by subtracting the fingerprints of the six rows from 2222_222222_222, then check if that's a possible row and not identical to one of the rows above.

The most important optimization of the final solution is the following. I don't iterate on all combinations of four rows and compute the last one. Instead, I first do a fast loop where I iterate on all possible combinations of the last two rows, and store these in a hash keyed by the sum of the two fingerprints. (There can be more than one pairs of rows for each such key, but never more than a few of them.) Then, I iterate on all possible combinations of the middle three rows, compute the sum of fingerprints the last two rows need to give, and look up what possible last two rows can complete the table. I lose something by this method, for now I need to check whether the sixth row is lexicographically greater than the fifth row (the middle three rows and the last two rows themselves are already sorted lexicographically), and throw away some potential solutions where this is false. This, however, is nothing compared to how much speed I gain by having to loop only three levels deep instead of four levels like above.

The final optimization concerns how I check whether the table has two identical <s>rows</s> columns, for I always have to do this in all solutions above. For this, I make a secondary fingerprint for each row. This is an integer of 21 bits: each bit corresponds to a pair of column indices in the table, and each bit is set in a fingerprint if those two cells in the row have an equal value. These numbers have the property that the bitwise and of the fingerprints of the seven rows are zero if an only if there are no two identical columns in the table. With these values precomputed for all possible rows, we can do the check very fast.

Here follows the full code, with the inputs hardcoded in the DATA section. The code is a bit ugly, for I wrote it quick and dirty to simulate how a contestant could do it live in the contest.

#!perl

use warnings; use strict;
use IO::Handle;
use Time::HiRes "time";

sub aeq {
    my($a, $b) = @_;
    @$a == @$b or die;
    for (0 .. @$a - 1) {
        $$a[$_] == $$b[$_] or return;
    }
    return 1;
}

my @poss;
{
    my @pos = (0, 0, 0, 0, 0, 0, 0);
    while (1) {
        my $k = @pos;
        while (0 <= $k && 2 <= $pos[--$k]) { $pos[$k] = 0; }
        -1 == $k and last;
        $pos[$k]++; 
        my @freq = (0)x3;
        $freq[$_]++ for @pos;
        aeq(\@freq, [2, 2, 3]) or next;
        push @poss, [@pos];
    }
}
#warn 0+@poss; # 210

sub key {
    0 + join "", map { ("10", "01", "00")[$_] } @_;
}
sub ckey {
    7 == @_ or die;
    my $r = 0;
    for my $x (1 .. 6) {
        for my $y (0 .. $x - 1) {
            $r <<= 1;
            $r += $_[$x] == $_[$y];
        }
    }
    0 + $r;
}

my $INF = *DATA;
my $num_inp = 0 + <$INF>;
for (1 .. $num_inp) {
    my $begin_time = time;
    my @r0 = <$INF> =~ /[012]/g;
    my @r1 = <$INF> =~ /[012]/g;
    my $s2 = key(@r0) + key(@r1);
    my $cs2 = ckey(@r0) & ckey(@r1);
    my @posr = grep { !aeq(\@r0, $_) && !aeq(\@r1, $_) } @poss;
    @posr == @poss - 2 or do {
        print STDERR "i ", join(" ", @r0), "\n  ", join(" ", @r1), "\n
+";
        print STDERR "w 0\n";
        print "0\n";
        flush STDOUT;
        next;
    };
    my @posk = map { key @$_ } @posr;
    my @posck = map { ckey @$_ } @posr;
    # most a feladat olyan az $m2 < $m3 < $m4 < $m5 < $m6 indexek össz
+eszámolása, amelyekre
    # $posk[$mi] összege plusz $s2 egyenlő 222222_2222_2222, 
    # és $posck[$mi] bitwise and-je and $cs2 egyenlő & 0.
    my %sx_lookup;
    for my $m5 (3 .. @posr - 2) {
        for my $m6 ($m5 + 1 .. @posr - 1) {
            my $sx = 222222_2222_2222 - $s2 - $posk[$m5] - $posk[$m6];
            my $csx = $cs2 & $posck[$m5] & $posck[$m6];
            push @{$sx_lookup{0 + $sx}}, [$m5, $m6, $csx];
        }
    }
    my($T) = (0);
    print STDERR "i ", join(" ", @r0), "\n  ", join(" ", @r1), "\n";
    for my $m2 (0 .. @posr - 5) {
        for my $m3 ($m2 + 1 .. @posr - 4) {
        for my $m4 ($m3 + 1 .. @posr - 3) {
            my $sx = $posk[$m2] + $posk[$m3] + $posk[$m4];
            my $csb = $posck[$m2] & $posck[$m3] & $posck[$m4];
            for my $rec (@{$sx_lookup{0 + $sx}}) {
                my($m5, $m6, $csx) = @$rec;
                $m4 < $m5 or next;
                my $cs = $csb & $csx;
                0 == $cs or next;
                $T++;
            }
        }}
        print STDERR "."; 
    }
    my $R = $T * (5*4*3*2*1);
    print $R, "\n";
    flush STDOUT;
    print STDERR "\no $T $R\n";
    printf STDERR "t %.2f sec\n", time - $begin_time;
}

__DATA__
10
2 1 0 0 2 2 1
0 2 2 1 1 0 2
2 0 0 2 2 1 1
2 0 1 2 2 1 0
2 1 0 1 2 0 2
2 2 1 1 2 0 0
2 1 0 1 2 2 0
1 2 0 2 1 2 0
0 2 0 1 2 1 2
1 2 0 2 1 0 2
2 0 0 1 2 2 1
2 1 1 0 2 2 0
1 2 2 0 0 2 1
2 0 1 0 1 2 2
2 2 1 1 0 2 0
1 0 2 1 2 0 2
2 0 1 1 2 0 2
0 1 2 2 0 1 2
0 2 2 0 1 2 1
1 0 1 2 2 2 0
[download]

There is one more notable point that surprised me when I found out during the preparations. This is that there are only fourteen (14) significantly different input data you can give in this problem. (This didn't help the contestants though, for we gave ten significantly different inputs, and this is the kind of contest where the contestants run their programs on their own computers and submit only the result.) There are thirteen possible outputs, for two of the inputs have an equal output for a reason unknown to me.

You may want to compute these fourteen inputs (pairs of two lines): this is much easier than solving the original problem. Here are them in canonical form in case you want to check.

0 0 1 1 2 2 2
0 0 1 2 1 2 2

0 0 1 1 2 2 2
0 0 2 2 1 1 2

0 0 1 1 2 2 2
0 1 0 1 2 2 2

0 0 1 1 2 2 2
0 1 0 2 1 2 2

0 0 1 1 2 2 2
0 1 1 2 0 2 2

0 0 1 1 2 2 2
0 1 2 2 0 1 2

0 0 1 1 2 2 2
0 2 0 2 1 1 2

0 0 1 1 2 2 2
0 2 1 2 0 1 2

0 0 1 1 2 2 2
0 2 2 2 0 1 1

0 0 1 1 2 2 2
1 1 0 0 2 2 2

0 0 1 1 2 2 2
1 1 0 2 0 2 2

0 0 1 1 2 2 2
1 1 2 2 0 0 2

0 0 1 1 2 2 2
1 2 0 2 0 1 2

0 0 1 1 2 2 2
1 2 2 2 0 0 1
[download]

Update: changed broken link from contest's old homepage to new homepage. Update 2015-11-26: fix typo.

Comment on Seven by seven farming puzzle Select or Download Code

Replies are listed 'Best First'.
Re: Seven by seven farming puzzle by salva (Canon) on Feb 04, 2010 at 12:17 UTC
Inspired by SWI-Prolog is a blindingly fast application component., I have done it using SWI-Prolog: Read more... (4 kB) update: oops, wrong problem...	[reply] [d/l] [select]
Re^2: Seven by seven farming puzzle by salva (Canon) on Feb 04, 2010 at 12:42 UTC
Read more... (4 kB) update: oops2, wrong problem again...	[reply] [d/l] [select]
Re^3: Seven by seven farming puzzle by ambrus (Abbot) on Feb 04, 2010 at 18:19 UTC
Okay, so after the criticism of your code I guess I must show an example of what I mean. The code below solves the six by six problem, for as I said in the original node, the seven by seven is not so easy. It's quite straightforward brute forcing: it tries all possibilities. It is not using the trick in the first spoiler block (which would make it faster). After filling each row, it checks that there are no two identical rows. Also, instead of iterating over all possible last rows, we compute its elements so that the columns are right. (In addittion, we don't need to check if the last row is right, for if each column and each other row has three white and three black squares each then the last column does too automatically.) A similar trick is used for the fifth row: instead of iterating through all possible fifth rows, we fill the fifth cell of each column in such a way that the column doesn't have more than three white or more than three black cells, and then check if the fifth row we've got is correct. We check that there are no two identical columns at the end, after filling the whole table. The code runs in SWI prolog in 107 seconds (on a fast computer), and produces the correct answer (194400). (Start with the goal `main`; some debugging lines are printed to show progress.) I hope my prolog coding style isn't too strange for you to follow. % Six by six chessboard problem, see "http://www.perlmonks.org/?node_i +d=821272" % Logical programming variant row(R) :- length(R, 6), rowpart(6, R). rowpart(T, R) :- rowpart(3, 3, T, R). rowpart(X, Y, 0, _) :- 0 =< X, 0 =< Y. rowpart(X, Y, T, [b \| R]) :- succ(Xp, X), succ(Tp, T), rowpart(Xp, Y, +Tp, R). rowpart(X, Y, T, [w \| R]) :- succ(Yp, Y), succ(Tp, T), rowpart(X, Yp, +Tp, R). board(B) :- B = [R0, R1, R2, R3, R4, R5], for(lambda(arg(R), length(R, 6)), B), transpose(B, BT), row(R0), write(board_debug0([R0])), nl, row(R1), R0 \== R1, row(R2), for(lambda(arg(R), R \== R2), [R0, R1]), row(R3), for(lambda(arg(R), R \== R3), [R0, R1, R2]), for(lambda(arg(C), rowpart(5, C)), BT), row(R4), for(lambda(arg(R), R \== R4), [R0, R1, R2, R3]), for(lambda(arg(C), row(C)), BT), %row(R5), % automatically true for(lambda(arg(R), R \== R5), [R0, R1, R2, R3, R4]), iota(6, I6), for(lambda(arg(M), ( take(M, BT, CB), nth0(M, BT, C), for(lambda(arg(U), U \== C), CB))), I6). nboard(N) :- findall(x, board(_B), L), length(L, N). main :- nboard(N), write(N), nl. /* [download] Read more... (4 kB)	[reply] [d/l] [select]
Re^3: Seven by seven farming puzzle by ambrus (Abbot) on Feb 04, 2010 at 16:12 UTC
Sadly you have to find the number of solutions, not just find one solution. Given the contents of the first two rows, compute the number of ways the rest of the table can be filled. For example, for the input you consider above, `[[2, 1, 0, 0, 2, 2, 1], [0, 2, 2, 1, 1, 0, 2]]`, the output should be `4909920` because that's the number of solutions. Finding just one solution (for each of the ten inputs) would be much easier done by hand than by a program. In addittion, this part looks very inefficient to me: `different_lists([A\|TA], [B\|TB]) :- ( A #\= B ; different_lists(TA, TB)).` [download] wouldn't this backtrack multiple times for any two lists that differ in more than one place? Wouldn't it cause to return a single solution multiple times? It seems like a bad idea to me to mix constraint programming with this kind of early backtracking. If I had to write this program, I'd either not use any constraint programming and backtrack over everything, or post everything as constraints and backtrack only once at the end with label.	[reply] [d/l] [select]
Re^3: Seven by seven farming puzzle by ambrus (Abbot) on Feb 04, 2010 at 19:31 UTC
And this code below solves the six by six puzzle the pure constraint programming way. Notice how it first posts all constraints deterministically (without branching), and then the final `labeling` is the only thing that branches. Runs in SWI prolog in 489 seconds (on a fast computer) and prints the correct answer. % Six by six chessboard problem, see "http://www.perlmonks.org/?node_i +d=821272" % Finite domain constraint programming variaton :- use_module(library(clpfd)). row(R) :- length(R, 6), R ins 0 .. 1, row1(R). row1([X0, X1, X2, X3, X4, X5]) :- X0 + X1 + X2 + X3 + X4 + X5 #= 3. /* differ([X0, X1, X2, X3, X4, X5], [Y0, Y1, Y2, Y3, Y4, Y5], P) :- (X0 #\= Y0) #\/ (X1 #\= Y1) #\/ (X2 #\= Y2) #\/ ... % nah, I'll ty +po this somewhere / vdiffer(Xs, Ys) :- zip(lambda(arg(X, Y, N), N = (X #\= Y)), Xs, Ys, Ns), foldrz(lambda(arg(M, N, Q), Q = (M #\/ N)), 0, Ns, P), P. pairwise_vdiffer([]). pairwise_vdiffer([H \| T]) :- for(lambda_close(H, Hi, arg(K), vdiffer(Hi, K)), T), pairwise_vdiffer(T). boardp(B) :- length(B, 6), for(lambda(arg(R), row(R)), B), transpose(B, BT), for(lambda(arg(C), row1(C)), BT), pairwise_vdiffer(B), pairwise_vdiffer(BT). board(B) :- boardp(B), concat(B, Bfl), [R0 \| _] = B, labeling([leftmost], R0), write(board_debug0([R0])), nl, labeling([leftmost], Bfl). nboard(N) :- findall(x, board(_B), L), length(L, N). main :- nboard(N), write(N), nl. / [download] Read more... (4 kB)	[reply] [d/l] [select]
Re^4: Seven by seven farming puzzle by salva (Canon) on Feb 07, 2010 at 11:15 UTC
Re^5: Seven by seven farming puzzle by ambrus (Abbot) on Feb 14, 2010 at 16:46 UTC
Re^5: Seven by seven farming puzzle by ambrus (Abbot) on Feb 14, 2010 at 18:06 UTC
Re^5: Seven by seven farming puzzle by salva (Canon) on Feb 10, 2010 at 12:38 UTC