Re^8: algorithm for 'best subsets'

Replies are listed 'Best First'.
Re^9: algorithm for 'best subsets' by BrowserUk (Patriarch) on Mar 05, 2005 at 00:09 UTC
Here's my code--it's not particularly well doc'd. The partitioning is done in the last set of nested for loops. I'm using the combined (&'d) bitmaps as the keys to the `%partns` hash and accumulating the item numbers as an array in the value. That means deleting the previous key, after transfering the value to the new key, each time a new item is added to a partitition. Is also requires a `next LABEL:` which is the first time I've ever needed to use that, but it seems to work okay. Omitting the `-NODETAILS` switch from the command line will cause `%partns` to be dumped to the screen. The `-W=w` and `-I=n` switches control the powers of 26 used for the keywords/items generation. The defaults are 2 & 2 which (I belive) is the dataset that halley offered for benchmarking, though I am building a different datstructire and (I suspect) that `rand()` may be producing different sequences on different builds/platforms? #! perl -slw use strict; use List::Util qw[ max sum ]; our $W \|\|= 2; my( $W1, $W2, $S2 ) = ( 'a'x$W, 'z'x$W, 'b'x$W ); our $I \|\|= 2; my( $I1, $I2 ) = ( 'a'x$I, 'z'x$I ); our $NODETAILS; our $SRAND \|\|= 12345; srand( $SRAND ) if $SRAND; my @allWords = 'k'.$W1 ... 'k'.$W2; ## All the +words my @stopWords = 'k'.$W1 ..'k'.$S2; ## Those rej +ected as 'noise' my @keyWords = @allWords[ @stopWords .. @allWords ]; ## Only the on +es we are interested in print "Keywords: " . @keyWords; ## The item names my @items = 'i'.$I1 .. 'i'.$I2; print "Items: " . @items; ## Builds the same dataset as the example datasets generator ## Just represents it in a more compact and malleable form. my @itemMap; for my $item ( 0 .. $#items ) { my $count = 4 + int rand( 8 * (1+ (($W-2) * 26)) ); my $vector = ''; vec( $vector, int( rand() * rand() * @keyWords ), 1 ) = 1 while $c +ount--; $itemMap[ $item ] = $vector; } ## Partition items into nonoverlapping sets my %partns; OUTER: for my $itemno ( 0 .. $#items ) { for my $partn ( keys %partns ) { next unless $partns{ $partn }; ## Skip old ones my $common = unpack '%b*', ( $itemMap[ $itemno ] & $partn ); next unless $common; my $newPartn = $partn & $itemMap[ $itemno ];## Form a new part +ition key $partns{ $newPartn } = $partns{ $partn }; ## move over the +old item numbers push @{ $partns{ $newPartn } }, $itemno; ## Add the new one delete $partns{ $partn } unless $partn eq $newPartn; ## del +ete the old partiton next OUTER; } ## If we got here, this item had no keywords in common with any ex +isting partition ## So start a new one $partns{ $itemMap[ $itemno ] } = [ $itemno ]; } unless( $NODETAILS ) { print "[ @$_ ]" for values %partns; } print scalar keys %partns, " partitons found."; exit; [download] Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply] [d/l] [select]
Re^9: algorithm for 'best subsets' by BrowserUk (Patriarch) on Mar 08, 2005 at 01:19 UTC
I assume you noticed that I was ANDing when I should have been ORing? Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply] [d/l]