in reply to algorithm for 'best subsets'
Update: Here's a useful results format:my %Items; sub build_test_data { # reproduceable case srand(12345); # Sorted by prevalence. Keyword 'kaa' is way more common than 'kz +z'. my @Keywords = 'kaa' ... 'kzz'; # Each node is associated with an asciibetical list of unique keyw +ords. # We groom out the top keywords which are basically noise. for my $xx ('iaa' .. 'izz') { my $count = int(rand(8)) + 4; $Items{$xx}{$Keywords[ int(rand()*rand()*@Keywords) ]}++ while $count--; delete $Items{$xx}{$_} for 'kaa'..'kab'; $Items{$xx} = [ sort keys %{$Items{$xx}} ]; } return unless @_; print Dumper \%Items; # lots of raw data! } build_test_data();
tuples of 3: 6 kaa kdf kea 6 kab kaf kka 4 kad kfa kfg ... tuples of 2: 9 kad kfa 8 kaj kda 8 kaj kda ...
--
[ e d @ h a l l e y . c c ]
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: algorithm for 'best subsets'
by fizbin (Chaplain) on Mar 03, 2005 at 16:36 UTC | |
by halley (Prior) on Mar 03, 2005 at 21:19 UTC | |
by Limbic~Region (Chancellor) on Mar 03, 2005 at 21:27 UTC | |
by fizbin (Chaplain) on Mar 03, 2005 at 22:57 UTC | |
by BrowserUk (Patriarch) on Mar 03, 2005 at 22:04 UTC |