I have information about two sets of domains (known to be of a certain category, and suspected to be of that category). Each of the suspected ones has its information compared to the known ones. I'm checking registrant and administrative contact details: company name, contact name, address, phone number, and email.

If XYZ.com's registrant's address is the same as ABC.com's registrant's address, then $match{ABC}{XYZ}{address} |= 1. If XYZ's admin's address is the same as ABC.com's admin's address, then $match{ABC}{XYZ}{address} |= 2. So on and so forth. Thus, I have a hash that looks like:

%match = ( 'ABC.com' => { 'XYZ.com' => { company => 1, # reg. only contact => 0, # no match address => 3, # reg. and admin. phone => 2, # admin. only email => 2, # admin. only }, ..., }, ... );
I want to come up with a logical metric that will allow me to sort through ABC.com's suspected linked domains in order of most likely connection to least likely. I don't want to sort merely by the number of non-zero shared info, but I don't want to multiply, and I'm not sure adding makes sense either. I'm not sure weighting comes into play; the fields have all been error checked, so there's no (truly) bogus data in them. (Some bogus data is there, but it's uniform, which is a good thing. Nevermind.)

Can someone enlighten me?


Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

In reply to Metric for confidence of complex match by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.