http://qs1969.pair.com?node_id=600519

in reply to decomposing binary matrices

At the risk of wasting your time a second time, I'm pretty sure that this method works. The method is based upon a half remembered technique for simplifying complex boolean conditions that I learnt at college and have never used since. You draw up a truth table for states and then swap whole columns or whole rows until the true values group together. I seem to recall that it didn't matter how many times you swapped things around so long as you always swapped whole columns or whole rows at a time

My POC below basically consists of sorting the matrix both horizontally and vertically, and then picking out the sets with a equal number of 'leading zeros'. The problem I had was tracking the positions the 1's came from. The proper way would be to use a matrix of anonymous arrays (or objects), each containing the row, column and boolean value. For simplicity, I used arrays of strings, replacing each '1' by the letter of it's starting column, and appending the row numbers (starting at 1) to each string. I then sort these strings, then tranforms the 'matrix' into another set of strings and sort again. AT this point you extract and retain the 'row numbers' item from the array.

What you end up with is the smallest group (least 1s/most 0s) sorted to the top. You count the number of leading zeros in group (min of the first two items) and select all the items with at least that number of leading zeros and place them into the first group. Then get the min leading zeros of the first two of the remaining items, and repeat. The final group will be the 'leftovers'.

The following shows the process step by step for the OP example.

```c:\test>600418-2.pl
This input

00CD01
A0CDE2
0B0D03
AB00E4
0BCD05
sorted

00CD01
0B0D03
0BCD05
A0CDE2
AB00E4

Tranformed looks like this

000AA
0BB0B
C0CC0
DDDD0
000EE
13524
sorted

000AA
000EE
0BB0B
13524
C0CC0
DDDD0
These are the sets where the letters denote the columns of the origina
+l matrix
(0 mean column not used in this set).
And the numbers above, the rows they came from.

13524  ## Column numbers from the original data. Trim leading zeros.
000AA  ## Read this as; Group one contains
000EE  ## columns A & E from rows 2 & 4.

13524  ## Leftovers group. Trim trailing columns used by earlier group
+s.
0BB0B  ## Columns B, C, & D from Rows 1, 3 & 5.
C0CC0
DDDD0

This works as is for the other examples you've supplied (see commented out data blocks).

I do remember from those far off college days that for some complex examples, it was possible for the grouping to 'wrap over the edges'. That is, if you draw the matrix on paper, and wrap it into a cylinder to bring the left and right edges together, a group could cross the boundary. The same was possible for the top & bottom edges.

I think that this would be catered for by repeating the sort/transform/sort and select process on the smaller groups, but I haven't taken it that far yet. Unless your matrices are huge, repeating the process to subdivide the smaller groups shouldn't be a problem.

```#! perl -slw
use strict;

sub xform {
my @out;
for my \$in ( @_ ) {
\$out[ \$_ ] .= substr \$in, \$_, 1  for 0 .. length( \$in ) -1;
}
return @out;
}

my @grid = ( "00CD01", "A0CDE2", "0B0D03", "AB00E4", "0BCD05", );
#my @grid=( "000DEF1", "000DEF2", "000DE03", "ABCDEF4", "ABCDEF5", "AB
+0DEF6", );
#my @grid=( "000DE01", "000D0F2", "0000EF3", "AB00004", "A0C0005", "AB
+00006", );
#my @grid=( "0B0D001", "000DE02", "0B00E03", "A0000F4", "A0C0005", "A0
+000F6", );

print "This input\n";
print "\t\$_" for @grid;

@grid = sort @grid;
print "sorted\n";
print "\t\$_" for @grid;

my @xformed = xform @grid;

print "\nTranformed looks like this\n";
print "\t\$_" for @xformed;

@xformed = sort @xformed;
print "sorted\n";
print "\t\$_" for @xformed;

## extra column label item.
my( \$label ) = grep /1/, @xformed;
@xformed = grep !/1/, @xformed;

my @subsets;
while( @xformed ) {
my \$mask = \$xformed[ 0 ] | \$xformed[ 1 ];
if( \$mask =~ m[(^0+)]  ) {
my \$count = length( \$1 ); ## Length of common zero prefix

push @{ \$subsets[ @subsets ] }, shift( @xformed ), shift( @xfo
+rmed );
while( @xformed
and \$xformed[ 0 ] =~ m[(^0+)]
and length( \$1 ) == \$count
) {
push @{ \$subsets[ -1 ] }, shift @xformed;
}
}
else {
push @subsets, [];
push @{ \$subsets[ -1 ] }, shift @xformed while @xformed;
}
}
print <<'EOS';

These are the sets where the letters denote the columns of the origina
+l matrix
(0 mean column not used in this set).
And the numbers above, the rows they came from.
EOS

print join "\n", \$label, @\$_, "\n" for @subsets;

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: decomposing binary matrices
by hv (Parson) on Feb 16, 2007 at 22:56 UTC

I think that approach has possibilities, though it might need some modification - there is no guarantee that the row with fewest bits is separable. Consider, say:

```A:  1 1 0 0 0
B:  1 0 1 1 0
C:  1 0 1 1 0
D:  1 0 1 1 0
E:  1 1 1 1 1
.. from which {B, C, D} is a separable submatrix. I'd have to think further whether I can easily construct an example that defeats the approach on both rows and columns simultaneously - it'd certainly need a larger matrix (at least 7x7 I think), but I think the same general idea could throw up a proper counterexample.

At the risk of wasting your time ...

Not a waste by any means - many of the suggestions so far could maybe be adaptable to a full solution. It's just a question of whether the adapted version would end up any quicker than going brute force in the first place. :)

Hugo

I just ran that example through the code I posted (Note: My code uses letters for columns not rows as you have). The groupings produced:

```23415
0000E
000BB

23415
AAAAA
CCC0C
DDD0D

gives rows 1 & 5 (your A & E), as a group which when subtracted from the remainer leaves rows 2, 3 & 4 (your {BCD}) as a group which is what you are after?

Full output from the unchanged program (except for inserting the new data (first line below)):

```# my @grid = ( "AB0001","A0CD02","A0CD03","A0CD04","ABCDE5" );

c:\test>600418-2.pl
This input

AB0001
A0CD02
A0CD03
A0CD04
ABCDE5
sorted

A0CD02
A0CD03
A0CD04
AB0001
ABCDE5

Tranformed looks like this

AAAAA
000BB
CCC0C
DDD0D
0000E
23415
sorted

0000E
000BB
23415
AAAAA
CCC0C
DDD0D
These are the sets where the letters denote the columns of the origina
+l matrix
(0 mean column not used in this set).
And the numbers above, the rows they came from.

23415
0000E
000BB

23415
AAAAA
CCC0C
DDD0D
```  my @grid = ( "ABCD1", "A0002", "0BCD3", "0BCD4");