Pseudocode for my version of the algorithm (without using G::UF or any graph concept at all), if I understand it correctly. By "kbits" I mean a keyword bit vector. By "parts" I mean partitions. Am I missing something?
# bag of parts # for each item # for each existing part # if this item's kbits intersect this part's kbits, # union up the keywords # for all other parts # if this part's kbits intersect that part's kbits, # merge that part into this part # prune parts emptied by merger # create a new part if no intersections found

It seems to work, and scans my whole current database of 5810 keywords in 6628 items in about three seconds.

Unfortunately, it grows to about 5 partitions maximum, and by the time it's done, it has merged back everything into one partition. I think that's the fault of my keywords pruning, though. Even though I filter out the 100 most boring prepositions and articles, I need to find out the remaining words that cause the most mergers...

Update: How depressing. Not only is 'war' the most common keyword in modern history, but it appears to be the common thread amongst all of the events as well; removing that one keyword broke the historical context into five separate partitions.

--
[ e d @ h a l l e y . c c ]


In reply to Re^5: algorithm for 'best subsets' by halley
in thread algorithm for 'best subsets' by halley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.