I have a algorithm problem -- one that makes me realize how much I'm missing by not having any formal CS education....

I want to deduce some appropriate groupings for the following situation:

I have three tables:

So, essentially, I have a two-dimensional grid, and if the bit in the grid is 'on', access will be granted. Now, just for discussion, let's say I have 1000 users and 100 reports. That's up to a million different, individual access permsissions that may (or may not) have been granted.

I know for sure that there are some patterns in here. Alice, Bob, and Carlos all work in Accounts Receivable, so they might get the same few reports. Likewise, Rob, Sarah and Tom might have some reports in common.

Now, I don't know who works with who, I only know this grid of 100x1000. How can I mathematically figure out which 'sets' of reports can be assigned to people instead of individual user<->report rules so that I have some efficient groups and long-term administration becomes a bit easier?

This is my starting theory:

The major problem with that is this:

I might get 4 reports. If you're my boss, you might get my 4 plus 2 more. Logically, the four we have in common could be grouped, and you'd have two individuals in addition to the group of 4. My plan would miss that entirely, and I have a strong gut feeling that I'm ignorant of some nifty methodology for doing this kind of thing.

The only other thing I've thought of is doing some kind of ratio on how well any two users match. In the prior example, it would be a 4/6 or 67% match.

I just can't quite get my head around this. I studied accounting in school, and we pretty much stopped with addition and subtraction. <g> Grouping makes my brain hurt, but since this seems like a common need, I'd think this wheel should be laying around somewhere, just not in a form that I'm recognizing at the moment.

Thanks Monks.

Update:

This question is coming up again. In response to Tilly's point about re-designing the system, that's been done. There is a 'group' function implemented, and it works correctly. However, the existing system has the old recip/report records in place and no groups back-filled.

I'm looking at 17k recipients and 121k reports, so the possibilities are *massive*, around 2billion 'squares' in the grid. 1.7M of them are in the true state, so it's very sparse.

This question is not one I have to solve, it's just one that I'd like to for learning's sake. I've put a lot of thought into it, but I just don't know enough about math theory to figure it out yet...


In reply to Deducing Ideal Groups by pboin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.