Shouldn't you have MIN instead of MAX? You are, after all, interested in the things with the most in common.

Jumping on tall_man's idea to use Bit::Vector::Overload (and shamelessly stealing your data generator), here's a new solution. It's reasonably quick (about 15x faster than yours on my slow machine, though a chunk of the difference is printing time) to generate all the tuples and spit them out, nicely ordered by cardinality.

There is much less output, because only tuples that actually represent the intersection of some pair of elements are included. When such a tuple is found, then the rest of the elements are checked to see if they should be included with it, so that the list for the tuple is complete.

use strict; use warnings; use Bit::Vector::Overload; use List::Util 'shuffle'; my $MIN = 3; ## Some keywords my @keywords = qw[ zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen ]; ## Generate some test data my %items = map{ $_ => [ @keywords[ ( shuffle( 0 .. $#keywords ) )[ 0 .. rand @keyw +ords ] ] ] } 'a' .. 'z'; print "The item list:\n"; print "$_ => @{ $items{ $_ } }\n" for sort keys %items; print '=' x 30, "\n"; # First, build an index of the distinct values by building a hash # (anonymous) and taking the keys my @val_index = keys %{{map {($_ => undef)} map {@$_} values %items}}; # and a reverse lookup my %rev_val_index = map {($val_index[$_] => $_)} 0..$#val_index; # Now represent each entry as a bit vector my %vectors; while (my ($k, $v) = each %items) { $vectors{$k} = new Bit::Vector(scalar(@val_index)); $vectors{$k}->Bit_On($_) for @rev_val_index{@$v}; } # Compare elements pairwise and add each element # to an AoHoH indexed by size of tuple and tuple member list # if they have elements in common my @intersections; my @item_keys = keys %items; for my $i (0..$#item_keys-1) { for my $j ($i+1..$#item_keys) { my $intersect = $vectors{$item_keys[$i]} & $vectors{$item_keys[$j] +}; my @common_elements = @val_index[$intersect->Index_List_Read]; next if @common_elements < $MIN; my $name_list = join ' ', @common_elements; @{$intersections[scalar @common_elements]{$name_list}}{@item_keys[ +$i,$j]} = (); # Include any higher-order matches in lower-order matches for my $k (0..$#item_keys) { next if $k == $i or $k == $j; my $new_isect = $intersect & $vectors{$item_keys[$k]}; if ($new_isect eq $intersect) { $intersections[scalar @common_elements]{$name_list}{$item_keys +[$k]} = (); } } } } for ($MIN..$#intersections) { next unless keys %{$intersections[$_]}; print "=== $_-tuples of common elements:\n"; while (my ($k, $v) = each %{$intersections[$_]}) { print "$k: ", join(', ', sort keys %$v), "\n"; } }

Caution: Contents may have been coded under pressure.

In reply to Re^2: algorithm for 'best subsets' by Roy Johnson
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.