Re: Help thinking about an alternate algorithm

So far all proposals seem to be O(N^2) and I cannot think of anything that has not a worst-case performance of that order. Stealing VinsWorldcoms example, one implementation would be

use strict;
use warnings;

my @students = (
    [qw(cap    shortsleeve jeans  leather flipflop)],
    [qw(beenie longsleeve  shorts cloth   highheel)],
    [qw(cap    tanktop     shorts rope    flat)],
    [qw(beenie longsleeve  shorts cloth   flipflop)]
);

my @best = ( -1 );
while( my $x = shift @students ) {
    Y: for my $y (@students) {
        my $diff = @$x;
        for( my $i = 0; $i<@$x; $i++ ) {
            $diff-- if $x->[$i] eq $y->[$i];
            next Y if $diff <= $best[0];
        }
        @best = ( $diff, $x, $y );
    }
}
print "@$_\n" for @best[1,2];
[download]

Not sure this is an improvement over what you have already.

I am still thinking whether any kind of sorting will improve performance. The aim would be to trigger the shortcut from the loop most often. It exits early if two arrays are similar but ONLY if one has already encountered two very different ones. My intuition is to say, one should put the columns with the least variation first, and also the rows with the most frequent items. But I cannot get to grips why this should be so....

Comment on Re: Help thinking about an alternate algorithm Download Code