in reply to decomposing binary matrices

At the risk of wasting your time a second time, I'm pretty sure that this method works. The method is based upon a half remembered technique for simplifying complex boolean conditions that I learnt at college and have never used since. You draw up a truth table for states and then swap whole columns or whole rows until the true values group together. I seem to recall that it didn't matter how many times you swapped things around so long as you always swapped whole columns or whole rows at a time

My POC below basically consists of sorting the matrix both horizontally and vertically, and then picking out the sets with a equal number of 'leading zeros'. The problem I had was tracking the positions the 1's came from. The proper way would be to use a matrix of anonymous arrays (or objects), each containing the row, column and boolean value. For simplicity, I used arrays of strings, replacing each '1' by the letter of it's starting column, and appending the row numbers (starting at 1) to each string. I then sort these strings, then tranforms the 'matrix' into another set of strings and sort again. AT this point you extract and retain the 'row numbers' item from the array.

What you end up with is the smallest group (least 1s/most 0s) sorted to the top. You count the number of leading zeros in group (min of the first two items) and select all the items with at least that number of leading zeros and place them into the first group. Then get the min leading zeros of the first two of the remaining items, and repeat. The final group will be the 'leftovers'.

The following shows the process step by step for the OP example.

c:\test>600418-2.pl This input 00CD01 A0CDE2 0B0D03 AB00E4 0BCD05 sorted 00CD01 0B0D03 0BCD05 A0CDE2 AB00E4 Tranformed looks like this 000AA 0BB0B C0CC0 DDDD0 000EE 13524 sorted 000AA 000EE 0BB0B 13524 C0CC0 DDDD0 These are the sets where the letters denote the columns of the origina +l matrix (0 mean column not used in this set). And the numbers above, the rows they came from. 13524 ## Column numbers from the original data. Trim leading zeros. 000AA ## Read this as; Group one contains 000EE ## columns A & E from rows 2 & 4. 13524 ## Leftovers group. Trim trailing columns used by earlier group +s. 0BB0B ## Columns B, C, & D from Rows 1, 3 & 5. C0CC0 DDDD0

This works as is for the other examples you've supplied (see commented out data blocks).

I do remember from those far off college days that for some complex examples, it was possible for the grouping to 'wrap over the edges'. That is, if you draw the matrix on paper, and wrap it into a cylinder to bring the left and right edges together, a group could cross the boundary. The same was possible for the top & bottom edges.

I think that this would be catered for by repeating the sort/transform/sort and select process on the smaller groups, but I haven't taken it that far yet. Unless your matrices are huge, repeating the process to subdivide the smaller groups shouldn't be a problem.

#! perl -slw use strict; sub xform { my @out; for my $in ( @_ ) { $out[ $_ ] .= substr $in, $_, 1 for 0 .. length( $in ) -1; } return @out; } my @grid = ( "00CD01", "A0CDE2", "0B0D03", "AB00E4", "0BCD05", ); #my @grid=( "000DEF1", "000DEF2", "000DE03", "ABCDEF4", "ABCDEF5", "AB +0DEF6", ); #my @grid=( "000DE01", "000D0F2", "0000EF3", "AB00004", "A0C0005", "AB +00006", ); #my @grid=( "0B0D001", "000DE02", "0B00E03", "A0000F4", "A0C0005", "A0 +000F6", ); print "This input\n"; print "\t$_" for @grid; @grid = sort @grid; print "sorted\n"; print "\t$_" for @grid; my @xformed = xform @grid; print "\nTranformed looks like this\n"; print "\t$_" for @xformed; @xformed = sort @xformed; print "sorted\n"; print "\t$_" for @xformed; ## extra column label item. my( $label ) = grep /1/, @xformed; @xformed = grep !/1/, @xformed; my @subsets; while( @xformed ) { my $mask = $xformed[ 0 ] | $xformed[ 1 ]; if( $mask =~ m[(^0+)] ) { my $count = length( $1 ); ## Length of common zero prefix push @{ $subsets[ @subsets ] }, shift( @xformed ), shift( @xfo +rmed ); while( @xformed and $xformed[ 0 ] =~ m[(^0+)] and length( $1 ) == $count ) { push @{ $subsets[ -1 ] }, shift @xformed; } } else { push @subsets, []; push @{ $subsets[ -1 ] }, shift @xformed while @xformed; } } print <<'EOS'; These are the sets where the letters denote the columns of the origina +l matrix (0 mean column not used in this set). And the numbers above, the rows they came from. EOS print join "\n", $label, @$_, "\n" for @subsets;