comment on

Well, I've tried the 3,000 column by 20,000 row thing.

respondent_id, response_1, response_2, .....

Searches of the form

select count(*) from theTable where (response_1=1 and response_n=1);

performs abysmally. Averages ~ 0.24 seconds per select execution. I'm looking for under 0.1 sec because I'm producing reports in real time that require a few hundred selects to do to produce a single report.

I can use respondent _id as primary key, but each of the 3,000 columns is equally likely to be searched on and I can't index them all. I don't believe mysql will allow 3,000 indexes on a single table.

So .... I got hinking about an inverted index so I could quickly retrieve the list of respondents who gave response x, the list of respondents who gave response y, and then compute their intersection.

My first version of this was to have a separate table for each possible reponse and populate it with respondent id's. That improved performance to ~0.16 sec average. Of course, each table has its own index. If, instead, I had one large inverted index,

response_id, respondent_1, respondent_2, ...

I would only need to use the one response_id column as primary key, reducing total size and complexity of the database. In that case I have two options:

put respondent id's in the columns and my selects become
select * from theTable where response_id = x
use the response id's as column names and fill with 1s and 0s, but then I need a way to associate matches with their column names.....

Soooo, I was thinking of trying the search engine trick and building a reverse index like Search::InvertedIndex.

I'm very interested in ideas.

Thanks.

In reply to Re^2: Does Search::InvertedIndex module live up to billing? by punch_card_don
in thread Does Search::InvertedIndex module live up to billing? by punch_card_don

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.