comment on

If you had any sort of design that put a single answer per row in some table, the nature of the questions being asked would require that the row also identifies the particular student and the particular exam for the given answer. That table would have 500 million rows (20,000 students * 500 exams * 50 answers per exam). A single flat table of this sort would handle the application you describe, but I'd be uncertain about the response time for querying a table of this size.

On the other hand, your existing flat file has 10 million rows (20,000 students * 500 exams) -- a lot less that 500 million, but then each row has an additional 48 columns.

So if some sort of flat table seems appropriate (and frankly, it probably is, given your description), then the choice of flat table design would hinge on relative performance given 50 times more rows vs. 48 additional columns per row.

If it were me, the first thing I'd try would be fewer rows / more columns. In this case, your initial guess about the kind of query operation was a bit off. No doubt you would figure this out yourself as soon as you set to work on it in earnest, but I think the queries would be more like this:

-- count people who said "a" on exam 1,question 20 and
--                       "c" on exam 43, question 37:
select count(*) from the_table where exam_id=1 and answer20='a' and 
  student_id in (select student_id from the_table where
                 exam_id=43 and answer37='c')
[download]

Of course, as soon as you need to intersect on three or more different questions, each from a different exam, you'll probably want to try using a perl hash array with a set of simple queries: each query just selects student_id's where 'exam_id=? and answer_X=?", and you increment hash elements keyed by student_ids after each query. Then pick out the hash keys whose value equals the number of queries performed.

It would probably be worthwhile to test the two flat-table alternatives (4 cols * lots of rows vs. 52 cols * fewer rows) on some suitably large sample of data, to see which one is easier/faster/more flexible.

In reply to Re: SQL database architecture by graff
in thread SQL database architecture by punchcard_don

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.