comment on

Here's my code--it's not particularly well doc'd.

The partitioning is done in the last set of nested for loops. I'm using the combined (&'d) bitmaps as the keys to the %partns hash and accumulating the item numbers as an array in the value.

That means deleting the previous key, after transfering the value to the new key, each time a new item is added to a partitition. Is also requires a next LABEL: which is the first time I've ever needed to use that, but it seems to work okay.

Omitting the -NODETAILS switch from the command line will cause %partns to be dumped to the screen.

The -W=w and -I=n switches control the powers of 26 used for the keywords/items generation.

The defaults are 2 & 2 which (I belive) is the dataset that halley offered for benchmarking, though I am building a different datstructire and (I suspect) that rand() may be producing different sequences on different builds/platforms?

#! perl -slw
use strict;
use List::Util qw[ max sum ];

our $W ||= 2; my( $W1, $W2, $S2 ) = ( 'a'x$W, 'z'x$W, 'b'x$W );
our $I ||= 2; my( $I1, $I2 )      = ( 'a'x$I, 'z'x$I );
our $NODETAILS;
our $SRAND ||= 12345;

srand( $SRAND ) if $SRAND;

my @allWords = 'k'.$W1 ... 'k'.$W2;                        ## All the 
+words
my @stopWords = 'k'.$W1 ..'k'.$S2;                        ## Those rej
+ected as 'noise'
my @keyWords = @allWords[ @stopWords .. @allWords ];    ## Only the on
+es we are interested in
print "Keywords: " . @keyWords;

## The item names
my @items = 'i'.$I1 .. 'i'.$I2;
print "Items: " . @items;

## Builds the same dataset as the example datasets generator
## Just represents it in a more compact and malleable form.
my @itemMap;
for my $item ( 0 .. $#items ) {
    my $count = 4 + int rand( 8 * (1+ (($W-2) * 26)) );
    my $vector = '';
    vec( $vector, int( rand() * rand() * @keyWords ), 1 ) = 1 while $c
+ount--;
    $itemMap[ $item ] = $vector;
}

## Partition items into nonoverlapping sets
my %partns;
OUTER: for my $itemno ( 0 .. $#items ) {
    for my $partn ( keys %partns ) {
        next unless $partns{ $partn }; ## Skip old ones
        my $common = unpack '%b*', ( $itemMap[ $itemno ] & $partn );
        next unless $common;

        my $newPartn = $partn & $itemMap[ $itemno ];## Form a new part
+ition key
        $partns{ $newPartn } = $partns{ $partn };    ## move over the 
+old item numbers
        push @{ $partns{ $newPartn } }, $itemno;    ## Add the new one
        delete $partns{ $partn } unless $partn eq $newPartn;    ## del
+ete the old partiton
        next OUTER;
    }
    ## If we got here, this item had no keywords in common with any ex
+isting partition
    ## So start a new one
    $partns{ $itemMap[ $itemno ] } = [ $itemno ];
}

unless( $NODETAILS ) {
    print "[ @$_ ]" for values %partns;
}
print scalar keys %partns, " partitons found.";

exit;
[download]

Examine what is said, not who speaks.

Silence betokens consent.

Love the truth but pardon error.

In reply to Re^9: algorithm for 'best subsets' by BrowserUk
in thread algorithm for 'best subsets' by halley

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.