in reply to Database Design Issues - OT

I don't want to say that your DBA is flat-out wrong.

But it sounds like it.

First a work about optimization. Don't! Next an explanation about that statement. Believe Knuth when he says, Premature optimization is the root of all evil. For the full explanation read what Code Complete has to say on the topic.

Your DBA is definitely making the mistake of trying to optimize as he goes. If you are going to avoid using your database as a relational database, then you are better off using flat files. OK, for maximum read-only performance you are probably going to be best off using dbm files.

Note that this is not a recommendation. (Much as I like and respect what Sleepycat has accomplished.)

Instead stop thinking fancy words like "normalization" and think about this as a programming problem. All of the important reasons why normalization are good have to do with the basic reasons why you wouldn't want to duplicate information in any other context. Normalizing a database is another aspect of creating a well-factored program. OK, so your basic tools for modularizing what goes on are slightly different, but if you are trying to put a modular front end on it (which it sounds like you are) then the need to modularize the database is just an extension of the need to modularize your whole system.

Now what people who first face a well-normalized database will first see is similar to what they see when they face a well-normalized program. There is a large amount of structure, and you need to constantly work within the implied rules. When you just want to get something done, the normalization is rarely going to make your life easier. However when you are trying to build a complex system, you will constantly find the normalization making it easier to "chunk" up to higher levels. And then when you need to change something important, it is going to be much easier for you to make that change and you will be able to make it with far less fears that you will be spending the next 6 months tracking down oopses from other things that need to be changed. (That doesn't mean that you won't spend 6 months doing that, you will. But it will happen less often and you will be free to consider far more fundamental changes in your application when the need arises.)

And if your modularization does cost you too much performance (unlikely if you have arranged to have a decent database and you are joining on keys, and joining against views where that makes sense), well then you optimize that after the fact. Just like you would for any other programming project. After all you don't generally know ahead of time whether you will need to optimize. If you do, the odds are very low that you would have guessed correctly where. But you definitely know what the costs of not modularizing up front are. And the losses in terms of reliability, development speed, etc are both significant and readily predictable.

But databases are somewhat different. What is different about databases is that in the real world you tend to run into the need to reconsider performance in a database more often than you would many other kinds of software. Any kind of modularization tends to work through introducing abstraction layers. This typically creates run-time work. Now as a link that merlyn pointed me at says, late binding is a good idea. However it costs performance. And frequently when you are using a database, you are dealing with very large amounts of data. Therefore performance is a real issue. When that happens, you either need to train your users to believe that the performance is normal, or you need to improve performance. Or both. But with a decent database engine and a well-normalized database with indexes for all of your important joins, it is unlikely to be an issue until you have hundreds of thousands or millions of records.

For instance I have a friend who works tracking consumer data. His databases tend to be in the terabytes, and they are that small only because he can't (yet) afford more disk space. (Ever thought about tracking all of the details on every credit-card purchase in the US? He would like to...) Needless to say, he frequently is forced to denormalize his data for performance reasons. However if he could normalize and have it run acceptably well, he would. In a flash. And the parts that he can normalize, he does.

But when time comes to improve performance, your DBA's thoughts on what is fastest are likely to be wrong. As Steve McConnell said, you never know what is going to work out faster. A real life example. I recently decided to try to speed up a view I use. It selects the most recent date I can use by trying to do a join of three tables looking for the most recent date it can find in all 3. So I figured that it had to be faster to replace it with a table with one value. I mean it is one answer, and most queries against the view only want to pull out data from one of the tables. It must be faster to join 2 tables (one of them with only one element) together than to do the more complex join on all 3 before joining back to one of them. Dead obvious, right? Wrong! What apparently is happening is that every query I do against said view goes against one or more of the 3 tables in the view. But Sybase is smart enough to keep a cache of partial results it needs in computing the view, and smart enough in my queries to join the result of the view against the partial tables and not the full original. So my queries were faster with the complex 3-table join because there is a non-obvious cache that things get to run against! (The production problem that caused the issue which I was concerned about was solved with a memory upgrade that we needed for other reasons.)

Now a final note. While I am clearly fond of normalizing databases, it is not an unmitigated good. This is an important point which is far too often forgotten. We so want to tell aspiring programmers the advice that is usually good that we fail to tell them that none of this advice always hols. This point is one which I think I best made at This is not a flame. It is unfortunate that it was made in the middle of a heated discussion, and my apologies to princepawn for bringing it up again. But I think the points that I was trying to get princepawn to recognize are generally important, and relevant in particular to this thread.

But back to the case at hand. I do think that normalization of information is a good thing to do in your situation. It sounds like you may have to work to get your DBA to understand this fact. While doing that try to keep in mind that your DBA is probably not used to thinking like a good programmer does. What he sees is always the immediate up front cost of normalizing. What he misses is the persistent downside from not normalizing that will kill your productivity in the end. Try to make him see and understand that...

  • Comment on Re (tilly) 1: Database Design Issues - OT