Re: Feasability of quick generation of complex report from large db?

Several reactions to this...

The original table that you ask us to “imagine” is a crosstab (a statistical term, not a database term). This is probably a result format, not an appropriate base-representation. Likewise, you really do not want to have “12,000 tables.”
Instead, let's generalize further:
- You have widgets and you have companies.
- One of the attributes of a company is that company's size_factor.
- One of the attributes of a widget is its class.
- Joining these two tables is a third table, which we might call (say...) company_widget, and the purpose of this table is to show, e.g. using a pair of columns named company and widget, that a particular company uses a particular widget. (If there are particular attributes of that use, that is to say, particular salient details of how a particular company uses a particular widget, those details would be represented as additional columns in table company_widget, because they are attributes of .. not the widget alone, nor the company alone, but rather of .. the relationship described by that table.) This is what we call a many-to-many relationship.
So now, instead of “tens of thousands of” tables, we now have only three.
What you're going to do, then, is to run probably one SQL query which uses clauses such as GROUP BY to obtain information about a selected company's use of widgets. This result is going to come to you as a sequential list of value pairs. You would then read that and put it into a two-dimensional Perl array, after filling the entire array with initial zeroes. In other words, you are creating the crosstab of the result-list you have just obtained from SQL.
HTML templating systems like Template Toolkit (in CPAN) can build an HTML table from a Perl hash using just template-directives.
Quite probably, this “immensely complicated problem” has just reduced itself to ... trivial. The “hundreds of” queries that seem at first-blush to be needed to solve this problem might well have been reduced to just one.

Replies are listed 'Best First'.
Re^2: Feasability of quick generation of complex report from large db? by punch_card_don (Curate) on Feb 06, 2008 at 16:35 UTC
Wonderfully rich reply - thanks. But I think, maybe, some miscommunications. To put it into similar terms: I have `companies` (ex: ABC Inc, XYZ Inc, ...) and I have `product classes` (ex: pens, liquid nitrogen, desk chairs, fork lifts, plywood, fourier-transform-infrared-spectrometres,...) yes, one of the attributes of a company is its `size_factor` a `product_class` has no attributes Someone has canvassed ~30,000 companies and asked them: do you belong to industry group X (ex: more than X employees)? do you belong to industry group Y (ex: located in Ontario, California)? do you belong to industry group Z(ex: less than 5 years in business)? do you use product class A? do you use product class B? do you use product class C? ...and so on, for ~12,000 product classes. So, all the info could have been put into one big 30,000 row x 12,000 column table, and then just query `SELECT company WHERE (group_a = 1 and product_class_C = 1);` [download] But I still have to repeat that query for 350 combinations of group and product_class. Originally, there was only going to be a requirement to produce a half-dozen cross-references at a time. So, came the idea of the reverse-index tables - easy and quick to identify the ten tables needed, find the intersection of these little <= 10Kb tables that are often only a few hundred rows long, and sum the size-factors. But THEN someone said "Hey, that's great - and fast - here, do that for this report of 350 intersections." Update: I appear to have at least partially answered my own question - even if it was done the other way, there'd still be 350 queries to run, except each one on a full 30k-row x 12k column table!	[reply] [d/l]
Re^3: Feasability of quick generation of complex report from large db? by talexb (Chancellor) on Feb 06, 2008 at 17:01 UTC
This sounds like a bit of heavy lifting for a database server -- but not an impossibility. What I'm not getting is how often you have to regenerate this big table. Every time a web user wants to look at it? Every time something in the underlying database changes? I would imagine you could generate and then cache the 'latest version' of the table at the end of every day, assuming new data is added throughout the day. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply]
Re^4: Feasability of quick generation of complex report from large db? by punch_card_don (Curate) on Feb 06, 2008 at 17:42 UTC
Hey TAlexb, Ya, everytime a user wants to see it is the request. The idea being circulated right now is that various combinations of product_classes and industry_groups will be dreamed up for a report. There are likely to be ~6,000 of these reports. Yes, someone is going to dream up ~6,000 combinations of 350 intersections. Then a user goes to the list of reports and selects which he wants to see today. And so, yes, I'm toying with the pre-generation option too, since once the data has been loaded, it never, ever changes. But at some point, you just know they're going to want to be able to generate every possible set and sub-set of intersections, so I think the pre-generation of static files would be a temporary delay of the inevitable pain.	[reply]
Re^3: Feasability of quick generation of complex report from large db? by pc88mxer (Vicar) on Feb 06, 2008 at 17:15 UTC
`So, all the info could have been put into one big 30,000 row x 12,000 +column table ...` [download] I don't think you want a 12,000 column table. Just use an 'intersection' table which has just two columns, one for the company and the other for the question they answered in the affirmative. There would not be any rows for questions they answered 'no' to. Your example query can be performed using a self-join like this: `SELECT a.company from CA as a, CA as b where a.company = b.company and + a.answer = 'group_a' and b.answer = 'product_class_c';` [download] (It should be obvious that indexes on the 'company' and 'answer' column would be very helpful.) Note that this method allows you to add more questions without having to change the schema of the table.	[reply] [d/l] [select]
Re^4: Feasability of quick generation of complex report from large db? by punch_card_don (Curate) on Feb 06, 2008 at 17:55 UTC
I don't think you want a 12,000 column table. On this, we are in complete agreement. Just use an 'intersection' table which has just two columns, one for the company and the other for the question they answered in the affirmative. There would not be any rows for questions they answered 'no' to. If I understand correctly if on average each company has answered 'yes' to ~500 questions, I'll have a two-column, 15-million row table? And I'd still have 350 individual queries to run on this table, no?	[reply]
Re^5: Feasability of quick generation of complex report from large db? by pc88mxer (Vicar) on Feb 06, 2008 at 18:36 UTC
Re^6: Feasability of quick generation of complex report from large db? by punch_card_don (Curate) on Feb 06, 2008 at 19:47 UTC
Some notes below your chosen depth have not been shown here