The first set of bit vectors was not used in the code I posted, but they are used in another version of the code I was working on. If I operate on them in the same way I do the second set of bit vectors, I should get a complete set of the items that were combined into each partition. It's useful for testing and for using the partitions later.

I suppose you could use Graph::UnionFind by itself, but it would be horrendously inefficient. You would have to compare each pair of items to see if they intersected -- O(N^2). With my method (once I get it working right) I only have to test against the combined bit-set of each partition -- O(N*P) where P is the average number of partitions.

If you really want to scale to tens of thousands of items, I think this is the right approach. However, trying to work from the outside of Graph::UnionFind was probably a mistake. I need to create a modified version that stores the bit-vectors as part of the internal structure. Then, as it re-organizes itself, it would be able to keep the information accurately.

UnionFind uses a forest of n-ary trees which have only parent pointers. You can easily go from a child to the root, but not vice-versa. It can also "flatten" the tree by making more of the pointers go directly to the top. I believe the flattening is what is causing me to lose information.

There are many references to the "union find" algorithm, some with animated Java applets. Here is one.


In reply to Re^4: algorithm for 'best subsets' by tall_man
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.