Re^3: SQL database architecture

Yes, these are very good points -- it's the same quantity of data no matter how you slice it up. I just said that the wider table would be the first thing I'd try -- and I did say I'd want to test the other way as well. 8^)

Since the wide table is just like the original flat file, there's a slight bit less work setting up and loading the DB, but that's not necessarily a good reason to pick it. Meanwhile, with the narrower, longer table, the example queries would be of the form:

select count(*) from long_table where exam_id=? and question_id=? and 
+answer=?
  and student_id in (select student_id from long_table where
                     exam_id=? and question_id=? and answer=?)
[download]

(with the parameters for the sub-query being different from those for the main (count()) query); and for more complicated conditions, doing queries for single "exam_id/question_id/answer" tuples and tracking the intersection through a perl hash array.

Looking at it this way, I have to agree with you, that the long/narrow table will likely end up being better over all, because a single query with three paramaterized conditions ("where exam_id=? and question_id=? and answer=?") will do the work of 50 different queries that would all have to be constructed and prepared separately on a wide table:

"where exam_id=? and answer_$x=?" for $x (1..50);
[download]

And of course, there are bound to be flexibility issues that the OP didn't mention (or hasn't realized), and the narrow table will make those easier as well.

update: As etcshadow surmised below, I am not especially adept at doing SQL joins; in the absence of proper training in RDB methods, I tend to fall back on habits of thought I've learned in other (non-RDB) areas. Up to now my limited skill with joins applied merely to combining distinct data from related tables based on a shared key, not trying to combine "disjoint" conditions on a single flat table, as shown in tilly's reply below; I'm very grateful and much better off for having seen that -- thanks! (And apologies for the off-beat advice above.)

Comment on Re^3: SQL database architecture Select or Download Code

Replies are listed 'Best First'.
Re: Re^3: SQL database architecture by tilly (Archbishop) on Nov 04, 2003 at 06:45 UTC
I would write those queries as: `select count() from answer_table a_1, answer_table a_2 where a_1.student_id = a_2.student_id and a_1.exam_id = ? and a_1.question_id = ? and a_1.answer = ? and a_2.exam_id = ? and a_2.question_id = ? and a_2.answer = ?` [download] If you have indexes on the exam and question of the table, the database should use that to fetch back just the answers to that question of that exam (the rest of the table won't be looked at), and will then join it very efficiently. This is virtually instantaneous. If you don't have any indexes, then you will probably wind up scanning the full table twice. Which is not good, but if you are using it for bulk access, it is not horrible either... In short, live by the saying, The key, the whole key, and nothing but the key. So help me Codd.* (Until you actually observe, not theorize, the existence of performance problems.)	[reply] [d/l]
Re: Re^3: SQL database architecture by etcshadow (Priest) on Nov 04, 2003 at 06:07 UTC
That's cool. BTW, (and I appologize if this comes off as condescending... that's not how it's intended) are you familiar with "joins" in relational databases? It's very much at the heart of what makes a relational database "relational". I just ask because a few of the different things you said in this thread make it seem like you're not familiar with using joins. All of the "use a hash table in perl" and "where other_table_id in (select id from other_table where ..." make me wonder. I ask because, if you're trying to force the issue too much, and avoid using joins where they are appropriate, you could be hamstringing your database a little bit. Most databases have what is known as something along the lines of an "execution path optimizer". That is, you ask the database for information, and the database figures out the best method for retrieving that data. If you descibe the data that you want by using join syntax, the datbase has more freedom to optimize your query. Also, utilizing joins to pull all the info you need in a single query (rather than running multiple queries and joining in perl with hash tables), you can reduce network round-trip time to your database. Of course, there are always going to be some good excuses to do your joins in perl rather than on the DB... but the default approach should really be with joins. ------------ :Wq Not an editor command: Wq	[reply]