comment on

Here's an artificial test data generator, for those of you who are interested in doing your own benchmarking before publishing your methods.

my %Items;

sub build_test_data
{
    # reproduceable case
    srand(12345);

    # Sorted by prevalence.  Keyword 'kaa' is way more common than 'kz
+z'.
    my @Keywords = 'kaa' ... 'kzz';

    # Each node is associated with an asciibetical list of unique keyw
+ords.
    # We groom out the top keywords which are basically noise.
    for my $xx ('iaa' .. 'izz')
    {
    my $count = int(rand(8)) + 4;
    $Items{$xx}{$Keywords[ int(rand()*rand()*@Keywords) ]}++
        while $count--;
    delete $Items{$xx}{$_} for 'kaa'..'kab';
    $Items{$xx} = [ sort keys %{$Items{$xx}} ];
    }

    return unless @_;
    print Dumper \%Items; # lots of raw data!
}

build_test_data();
[download]

Update: Here's a useful results format:

tuples of 3:
6 kaa kdf kea
6 kab kaf kka
4 kad kfa kfg
 ...
tuples of 2:
9 kad kfa
8 kaj kda
8 kaj kda
 ...
[download]

--
[ e d @ h a l l e y . c c ]

In reply to Re: algorithm for 'best subsets' by halley
in thread algorithm for 'best subsets' by halley

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.