respondent_id, response_1, response_2, .....
Searches of the form
select count(*) from theTable where (response_1=1 and response_n=1);
performs abysmally. Averages ~ 0.24 seconds per select execution. I'm looking for under 0.1 sec because I'm producing reports in real time that require a few hundred selects to do to produce a single report.
I can use respondent _id as primary key, but each of the 3,000 columns is equally likely to be searched on and I can't index them all. I don't believe mysql will allow 3,000 indexes on a single table.
So .... I got hinking about an inverted index so I could quickly retrieve the list of respondents who gave response x, the list of respondents who gave response y, and then compute their intersection.
My first version of this was to have a separate table for each possible reponse and populate it with respondent id's. That improved performance to ~0.16 sec average. Of course, each table has its own index. If, instead, I had one large inverted index,
response_id, respondent_1, respondent_2, ...
I would only need to use the one response_id column as primary key, reducing total size and complexity of the database. In that case I have two options:
I'm very interested in ideas.
Thanks.
In reply to Re^2: Does Search::InvertedIndex module live up to billing?
by punch_card_don
in thread Does Search::InvertedIndex module live up to billing?
by punch_card_don
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |