If I understand correctly if on average each company has answered 'yes +' to ~500 questions, I'll have a two-column, 15-million row table?

You're going to have 15 million somethings - you have a lot of data! This is one of the most efficient ways to represent a large sparse matrix. Also, having the data in this form will allow you answer so many other kinds of questions.

As I read your original problem statement, you are only are interested in the number of companies in each intersection. Moreover, it seems that you are only interested in cross-indexing company group by product class. I would create two intersection tables, one for the company group answers (call it CG), and another for the company product class answers (call it CP).

You'll still have the 15M rows, but it will be split between two tables.

Then you can do this with one query:

SELECT CG.answer as cgroup, CP.answer as pclass, count(*) FROM CG, CP WHERE CG.company = CP.company GROUP BY cgroup, pclass;

With the proper indexes, I think you'll be surprised at how fast this can be computed. Depending on the database you're using, you might want to do a LEFT JOIN instead of a regular join. You'll be able to tell by doing an explain on the different queries.

You can do it with the original schema, except that it would be harder to restrict the join condition to just company-group and product-class.


In reply to Re^5: Feasability of quick generation of complex report from large db? by pc88mxer
in thread Feasability of quick generation of complex report from large db? by punch_card_don

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.