comment on

One way to look at whether a denormalized approach will cause problems in the future is to see if the approach caused problems in the past. If you have a denormalized legacy database you can run queries that might reveal embarrassing data problems.

For example, I ran a query to find unreasonable values in a database that held data about electronic components. I found many problems quickly. After that, a normalized version of the database was implemented with many sanity checks and metadata stored in tables.

The good news is that the new database has much higher data quality. A user interface to the database can be created directly from the meta-data.

The bad news is that the sanity checks require maintainance, and you need to join at least six tables to get anything out of the database. Many users of the database don't understand the schema, so they tend to dump large portions of it (effectively denormalizing it themselves). Text filters are then used to get at the data.

Normalized databases require deeper understanding of the data. Many people will gladly work hard so that they don't have to understand something.

The highest performance approach that uses this wonderfully normalized database queries the database in a mod_perl program, creating a denormalized, lightning-fast, easy-to-search data structure for accessing the data.

Programs that need to update the database still have to work with the complicated schema.

To know that this approach would be useful, we had to know something important in advance: we wanted high-speed queries and we knew that updates would only occur in a well-controlled nightly batch job.

A deep understanding of the fundamental problem that your project is trying to solve will enable you to do a much better job of analyzing the pros and cons of design issues. It's great to think things like "Normalization is almost always good." It's even more impressive to know something about the application that will show how well your approach will work in the long run.

It is a really good sign that you are willing to ask about such deep issues and keep an open mind. Keep thinking! Avoiding these issues is false laziness.

Update: Tried to fix some grammar.

It should work perfectly the first time! - toma

In reply to Re: Database Design Issues - OT by toma
in thread Database Design Issues - OT by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.