in reply to Help thinking about an alternate algorithm
So far all proposals seem to be O(N^2) and I cannot think of anything that has not a worst-case performance of that order. Stealing VinsWorldcoms example, one implementation would be
use strict; use warnings; my @students = ( [qw(cap shortsleeve jeans leather flipflop)], [qw(beenie longsleeve shorts cloth highheel)], [qw(cap tanktop shorts rope flat)], [qw(beenie longsleeve shorts cloth flipflop)] ); my @best = ( -1 ); while( my $x = shift @students ) { Y: for my $y (@students) { my $diff = @$x; for( my $i = 0; $i<@$x; $i++ ) { $diff-- if $x->[$i] eq $y->[$i]; next Y if $diff <= $best[0]; } @best = ( $diff, $x, $y ); } } print "@$_\n" for @best[1,2];
Not sure this is an improvement over what you have already.
I am still thinking whether any kind of sorting will improve performance. The aim would be to trigger the shortcut from the loop most often. It exits early if two arrays are similar but ONLY if one has already encountered two very different ones. My intuition is to say, one should put the columns with the least variation first, and also the rows with the most frequent items. But I cannot get to grips why this should be so....
|
|---|