Here is my entry, making use of some nice operations from Bit::Vector::Overload. It still has some rough spots in combining the final results, but it narrows things down very quickly. No combinatoric generators are used.

Update: Revised to collect the groups much better. I believe it will now do what Roy Johnson suggested. I didn't change any hashes to arrays, though.

Timing on my machine for a 676-item test case generated by the benchmark program halley did:

4.830u 0.000s 0:08.29 58.2% 0+0k 0+0io 420pf+0w

Update2: I also tried a 17,576 item example (with 'kaaa' ... 'kzzz' and 'iaaa' .. 'izzz'). It ran for one hour to find all groups from 2 up to 4 (the maximum available in this case). The timing is consistent with O(I^2 * log K), where I is the item count and K is the keyword count.

Update3: Inner loop optimization -- better ways to test for empty sets (is_empty) and count bits in sets (Norm). Went from one hour to 54 minutes on the biggest case.

#!/usr/bin/perl use strict; use warnings; use Bit::Vector::Overload; my %items = ( a => [ qw/one six/ ], b => [ qw/two three five/ ], c => [ qw/one two five/ ], d => [ qw/one seven five/ ], e => [ qw/one two five/ ], f => [ qw/one two four seven/ ], g => [ qw/one two five/ ], h => [ qw/one two three five/ ], ); my $icount = keys %items; # Form a mapping from items to bit positions. # Collect a list of bitmaps for combination work. my $ix = 0; my %ipos; my @ilst; for my $itm ( sort keys %items ) { $ipos{$itm} = $ix; my $set1 = new Bit::Vector($icount + 1); $set1->Bit_On($ix); push @ilst, $set1; ++$ix; } my %revipos = reverse(%ipos); # Form a mapping from keywords to bit positions. my $scount = 0; my %kpos; for my $itm ( sort keys %items ) { for my $keyw ( @{ $items{ $itm }} ) { $kpos{$keyw} = $scount++ if (!exists $kpos{$keyw}); } } # Also form a reverse index for later printing. my %revkpos = reverse(%kpos); # Form bit vectors with ones in the keyword positions, # one for every item. my %keyword_vecs; my @lst1; for my $itm ( sort keys %items ) { my $set0 = new Bit::Vector($scount + 1); $keyword_vecs{$itm} = $set0; for my $keyw ( @{ $items{ $itm }} ) { $set0->Bit_On($kpos{$keyw}); } # hold both sets - items and keywords together. push @lst1,[$ilst[$ipos{$itm}], $set0]; } # Must have at least matching pairs. my @lst2; my %same_merger; # want to merge combos with common intersections. my $i; my $j; my $imax = @lst1; for ($i = 0; $i < $imax; ++$i) { for ($j = $i+1; $j < $imax; ++$j) { my $kcombo = $lst1[$i]->[1] & $lst1[$j]->[1]; next if $kcombo->is_empty(); next if $kcombo->Norm() < 2; my $k = "$kcombo"; if (exists $same_merger{$k}) { $same_merger{$k}->[0] |= $lst1[$i]->[0]; $same_merger{$k}->[0] |= $lst1[$j]->[0]; } else { $same_merger{$k} = [ ($lst1[$i]->[0] | $lst1[$j]->[0]), $kcom +bo ]; } } } for (keys %same_merger) { my $kref = $same_merger{$_}; my $icombo = $kref->[0]; my $kcombo = $kref->[1]; my @inames = @revipos{ $icombo->Index_List_Read() }; my @knames = @revkpos{ $kcombo->Index_List_Read() }; # Result could be externally sorted, or sort the lst2 array and the +n print. print scalar(@inames)," : @inames combo is @knames","\n"; push @lst2,[$icombo, $kcombo]; }

In reply to Re: algorithm for 'best subsets' by tall_man
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.