On the other hand, your existing flat file has 10 million rows (20,000 students * 500 exams) -- a lot less that 500 million, but then each row has an additional 48 columns.
So if some sort of flat table seems appropriate (and frankly, it probably is, given your description), then the choice of flat table design would hinge on relative performance given 50 times more rows vs. 48 additional columns per row.
If it were me, the first thing I'd try would be fewer rows / more columns. In this case, your initial guess about the kind of query operation was a bit off. No doubt you would figure this out yourself as soon as you set to work on it in earnest, but I think the queries would be more like this:
Of course, as soon as you need to intersect on three or more different questions, each from a different exam, you'll probably want to try using a perl hash array with a set of simple queries: each query just selects student_id's where 'exam_id=? and answer_X=?", and you increment hash elements keyed by student_ids after each query. Then pick out the hash keys whose value equals the number of queries performed.-- count people who said "a" on exam 1,question 20 and -- "c" on exam 43, question 37: select count(*) from the_table where exam_id=1 and answer20='a' and student_id in (select student_id from the_table where exam_id=43 and answer37='c')
It would probably be worthwhile to test the two flat-table alternatives (4 cols * lots of rows vs. 52 cols * fewer rows) on some suitably large sample of data, to see which one is easier/faster/more flexible.
In reply to Re: SQL database architecture
by graff
in thread SQL database architecture
by punchcard_don
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |