This happens to be a classic computational chemistry problem. When searching large chemical databases, compound characteristics are hashed into long bit strings to minimize the number of expensive graph comparisons. One common type of pairwise bitstring comparison is called a Tanimoto coefficient.

Tanimoto = (A & B) / (A | B)
Ie, the number of characteristics in common divided by the number of characteristics found in either. The bias is that positive information counts higher than negative. This is used to look for closest relatives, find a diverse subset and compare the diversity of collections (comparing the average of each object's closest relative within its collection).

I've probably gone too far with all this Comp Chem stuff but it might be a good way to compare many patents. Could also look for Cosine or Dice coefficients used in other fields.


In reply to Re: Huge data file and looping best practices by igelkott
in thread Huge data file and looping best practices by carillonator

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.