This does all the pairs, triples, quads, quins and sextuplets in 6 seconds, producing 60,000+ lines in the process.

With the 615,000 lines from 2 .. 10 keywords combinations taking around 2 1/2 minutes.

My simple combinations generator chews a fair bit of memory though. An iterator would be prefereable.

It relies on each item being representable by a single byte--which is okay upto 255 items if you map the real names to chars.

The output format:

6 two four five nine seventeen eighteen => aekrsy

says that the 6 keywords 'two', 'four', etc. all appeared in each of items a, e, k, r, s, & y.

#! perl -slw use strict; use List::Util qw[ shuffle ]; use Data::Dumper; our $MAX ||= 6; ## A better Combinations iterator wouldn;t go amis. sub Cnr{ my( $n, @r ) = shift; return [] unless $n--; for my $x ( 0 .. ($#_ - $n) ) { push @r, map{ [ $_[$x], @$_ ] } Cnr( $n, @_[ ($x + 1) .. $#_ ] ); } return @r; } ## Some keywords my @keywords = qw[ zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen ]; ## Generate some test data my %items = map{ $_ => [ @keywords[ ( shuffle( 0 .. $#keywords ) )[ 0 .. rand @keyw +ords ] ] ] } 'a' .. 'z'; print "$_ => @{ $items{ $_ } }" for sort keys %items; ## Build a byte mapped index ## Keywords to items my %index = map{ $_ => chr(0) x 26 } @keywords ; for my $item ( keys %items ) { substr $index{ $_ }, ord($item) - ord( 'a' ), 1, $item for @{ $it +ems{ $item } }; } my %combs; ## for combination of 2, 3, 4 .. keywords. for my $n ( 2 .. $MAX ) { print "n-ary:$n"; ## Generate the pairs triples etc for my $comb ( Cnr $n, @keywords ) { ## AND the byte maps together $combs{ "$n @$comb" } = $index{ $comb->[ 0 ] } & $index{ $comb +->[ 1 ] }; $combs{ "$n @$comb" } &= $index{ $comb->[ $_ ] } for 2 .. $n-1 +; ## and strip the nulls. $combs{ "$n @$comb" } =~ tr[\0][]d; ## Delete it, if no items with these keywords were found. delete $combs{ "$n @$comb" } unless length $combs{ "$n @$comb" + }; } } ## Display or file the results. print "$_ => $combs{ $_ }" for sort{ local $^W; 0+$a <=> 0+$b } keys % +combs; __END__ a => six seven seventeen two sixteen nineteen three eighteen fifteen t +welve eleven fourteen thirteen eight nine four one five ten zero b => seven ten five two zero six one nine three seventeen c => eleven four eighteen twelve fifteen seventeen seven six fourteen +nine zero sixteen three one d => fifteen ten sixteen seventeen nineteen five two twelve four zero +thirteen eighteen eleven e => zero eight seventeen eleven fourteen ten fifteen four eighteen ni +neteen three two seven one five nine f => ten five nine g => fifteen nineteen seventeen eight ten seven h => four eighteen one fourteen fifteen three seven zero twelve thirte +en seventeen nineteen nine five i => one two j => seven five thirteen twelve one eighteen nineteen three k => ten two fifteen six nineteen four seven nine twelve one eleven ei +ghteen seventeen five l => one five eighteen fourteen thirteen nineteen nine six four eleven + zero eight seven twelve m => nineteen thirteen seven four ten three nine zero two fourteen eig +ht five sixteen six twelve n => eleven eighteen nineteen ten three twelve fifteen one fourteen ei +ght thirteen six seven o => eleven eight four eighteen seventeen sixteen nine twelve nineteen + zero thirteen three p => zero ten twelve eleven eighteen two sixteen q => ten fifteen sixteen eighteen eleven thirteen zero seven nineteen +three six seventeen fourteen one eight two four r => zero nineteen fifteen thirteen one twelve three seventeen six fou +r eight sixteen seven eleven nine five eighteen ten two fourteen s => ten two zero fifteen five nine nineteen three sixteen one eightee +n seventeen eight fourteen eleven seven six thirteen twelve four t => seventeen four one ten zero nine fourteen u => fifteen two v => seven w => fifteen eighteen seven nineteen x => eight two y => thirteen twelve fourteen fifteen seventeen sixteen three two five + ten zero nine one four eighteen nineteen z => two nine seven sixteen one six zero fourteen seventeen four ninet +een twelve fifteen eight eighteen n-ary:2 n-ary:3 n-ary:4 n-ary:5 n-ary:6 2 six seventeen => abckqrsz 2 six nine => abcklmrsz 2 twelve nineteen => adhjklmnorsyz ... 3 zero thirteen fifteen => adhqrsy 3 eleven fourteen eighteen => acelnqrs 3 one ten eleven => aeknqrs ... 4 two three eight eleven => aeqrs 4 four seven ten fifteen => aekqrs 4 zero ten eleven sixteen => adpqrs ... 5 zero two five seven fifteen => aers 5 three six eight fourteen fifteen => anqrs 5 one ten eleven fourteen eighteen => aenqrs ... 6 two four five nine seventeen eighteen => aekrsy 6 one five six nine twelve eighteen => aklrs 6 one four six seven sixteen eighteen => acqrsz

Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

In reply to Re: algorithm for 'best subsets' by BrowserUk
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.