comment on

Fellow Monks,

I realize this is more of an SQL question, but all the middleware will be Perl querying a mySQL database using DBI, I promise.

Not looking for someone to tell me what to do, or do my job. I'm seeking methodological or, even better, quantitative, means to choose/justify one SQL database architecture over another where query performance is the priority.

Imagine this: A database holds students' school exam results. By the end of their scholastic career, each person has taken 500 exams. Every exam had 50 multiple-choice questions with answers numbered 1 to 9. Our starting point is a flat-file of records for 20,000 students of the form:

student1, exam1, answer1, answer2,..., answer50
student1, exam2, answer1, answer2,..., answer50
...
student1, exam500, answer1, answer2,..., answer50

student2, exam1, answer1, answer2,..., answer50
student2, exam2, answer1, answer2,..., answer50
...
student2, exam500, answer1, answer2,..., answer50

Now we want to compile statistics on who answered what. What percentage of total students who got question x in exam y right, also got question b of exam c right? In other words

SELECT COUNT(*) WHERE response-to-queston-x-on-exam-z=y AND response-to-queston-b-on-exam-d=c

and we want to do this over the web letting researchers specify the questions and answers they query on.

A few possible architectures:

1. Design the database exactly as the flat-file and take the intersection of

SELECT COUNT(*) WHERE (exam=z AND answeri=y)
with
SELECT COUNT(*) WHERE (exam=d AND answerj=c)

2. Append all the records for a given student to make one 2,500 column record per student and search

SELECT COUNT(*) WHERE exami_answerj=y AND examb_answerc=d

3. Give each of the possible 25,000 answers to each multiple-choice question a unique 5-digit identifier and its own column, then just signify if that answer was given by a "1" in its column, and search

SELECT COUNT(*) WHERE ansxxxxx AND ansyyyyy

4. Make a separate table for each possible answer with studentID as the key, then take interstection of searches.

5. Make a separate table for each question, leaving the multiple-choice answer in the 2,500 tables and searching for the desired answer

But as I said, I'm hoping for some method or analysis by which to make/justify the choice. "I think XXX would be best" won't impress my boss much. Is there a way to choose - preferably quantitatively? Or does one just have to build, benchmark, and repeat?

In reply to SQL database architecture by punchcard_don

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.