in reply to Re^6: algorithm for 'best subsets'
in thread algorithm for 'best subsets'
It's definitely a very valuable pass to make. It cuts down the combinatorics immensely, especially (as you said) if you can remove the most common words from the picture.
I implemented it myself, because I couldn't work out how to accumulate the partiton items on the fly using Graph::UnionFind--for good reason it seems:).
My homebrew implementation finds
[22:56:47.23] P:\test>436050-2 -W=3 -I=2 -NODETAILS Keywords: 16873 Items: 676 141 partitons found. [22:56:47.81] P:\test>436050-2 -W=4 -I=2 -NODETAILS Keywords: 438697 Items: 676 292 partitons found. [22:57:10.23] P:\test>436050-2 -W=3 -I=3 -NODETAILS Keywords: 16873 Items: 17576 745 partitons found. [22:57:28.48] P:\test>
Which is good news because it means that it scales well in both directions.
I tried the 438897K / 438697I combination, but even with avoiding the memory requirement of Gr::UnF's hashes, it still requires more memory than I have, and my disk is badly fragged so I am defragging it before trying again.
Once you have partitioned, would you agree that it makes sense to do a pairwise combinations of the partitions and isolating those pairs within the partition that have nothing in common?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^8: algorithm for 'best subsets'
by tall_man (Parson) on Mar 04, 2005 at 23:35 UTC | |
by BrowserUk (Patriarch) on Mar 05, 2005 at 00:09 UTC | |
by BrowserUk (Patriarch) on Mar 08, 2005 at 01:19 UTC |