I'm trying to put it into practice. Here's the situation:
I have ~20,000 users, each reporting an average of 600 characteristics out of a possible 3,000 characteristics. The objective of this system is to count users who correspond to various collections of characteristics, in the general form:
count all users who reported characteristics a, b &c
I've created a two-column databse that looks like this:
characteristic_id (int), user_id(int)
with no indexes to start, as a baseline. So, for the average user, there are 600 rows, that makes 12-million rows in total.
Following your suggestions, I connected by telnet to execute a few queries.
select count(*) from theTable
correctly reports 12,000,000 rows in 0.00s
select count(*) from theTable where characteristic_id = 123
reports in 0.02s
At this point I'm thinking this is really great. But then
select count(*) from theTable where user_id = 456
reports in 10.77s
And then applying VSarkiss' excellent style:
select count(*) from theTable A inner join theTable B on A.user_id = B.user_id where A.characteristic_id = 123 and B.characteristic_id = 124
reports in 3min 23s!
So - in effect, nothing to do with Perl.
But a new problem. It takes several dozen such queries to compile the report data I need. Clearly 3-minutes per query is unacceptable.
Personally, I'm amazed by the magnitude of the increase in response time.
Sooooooo - dare I hope to see a two-orders-of-magnitude decrease in response time by adding indexes? Or do you think my database design is fubar?
In reply to Re: Does DBI handle table alias?
by punch_card_don
in thread Does DBI handle table alias?
by punch_card_don
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |