punchcard_don has asked for the wisdom of the Perl Monks concerning the following question:
I realize this is more of an SQL question, but all the middleware will be Perl querying a mySQL database using DBI, I promise.
Not looking for someone to tell me what to do, or do my job. I'm seeking methodological or, even better, quantitative, means to choose/justify one SQL database architecture over another where query performance is the priority.
Imagine this: A database holds students' school exam results. By the end of their scholastic career, each person has taken 500 exams. Every exam had 50 multiple-choice questions with answers numbered 1 to 9. Our starting point is a flat-file of records for 20,000 students of the form:
student1, exam1, answer1, answer2,..., answer50
student1, exam2, answer1, answer2,..., answer50
...
student1, exam500, answer1, answer2,..., answer50
student2, exam1, answer1, answer2,..., answer50
student2, exam2, answer1, answer2,..., answer50
...
student2, exam500, answer1, answer2,..., answer50
Now we want to compile statistics on who answered what. What percentage of total students who got question x in exam y right, also got question b of exam c right? In other words
SELECT COUNT(*) WHERE response-to-queston-x-on-exam-z=y AND response-to-queston-b-on-exam-d=c
and we want to do this over the web letting researchers specify the questions and answers they query on.
A few possible architectures:
1. Design the database exactly as the flat-file and take the intersection of
SELECT COUNT(*) WHERE (exam=z AND answeri=y)
with
SELECT COUNT(*) WHERE (exam=d AND answerj=c)
2. Append all the records for a given student to make one 2,500 column record per student and search
SELECT COUNT(*) WHERE exami_answerj=y AND examb_answerc=d
3. Give each of the possible 25,000 answers to each multiple-choice question a unique 5-digit identifier and its own column, then just signify if that answer was given by a "1" in its column, and search
SELECT COUNT(*) WHERE ansxxxxx AND ansyyyyy
4. Make a separate table for each possible answer with studentID as the key, then take interstection of searches.
5. Make a separate table for each question, leaving the multiple-choice answer in the 2,500 tables and searching for the desired answer
But as I said, I'm hoping for some method or analysis by which to make/justify the choice. "I think XXX would be best" won't impress my boss much. Is there a way to choose - preferably quantitatively? Or does one just have to build, benchmark, and repeat?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: SQL database architecture
by perrin (Chancellor) on Nov 03, 2003 at 19:54 UTC | |
by Art_XIV (Hermit) on Nov 03, 2003 at 20:17 UTC | |
|
Re: SQL database architecture
by jdtoronto (Prior) on Nov 03, 2003 at 20:44 UTC | |
|
Re: SQL database architecture
by etcshadow (Priest) on Nov 03, 2003 at 21:00 UTC | |
|
Re: SQL database architecture
by graff (Chancellor) on Nov 04, 2003 at 03:43 UTC | |
by etcshadow (Priest) on Nov 04, 2003 at 04:20 UTC | |
by graff (Chancellor) on Nov 04, 2003 at 05:37 UTC | |
by tilly (Archbishop) on Nov 04, 2003 at 06:45 UTC | |
by etcshadow (Priest) on Nov 04, 2003 at 06:07 UTC |