comment on

The easy bit first: you and I appear to be in complete agreement about NULLs.

Now for the hard bit:

Your queries were improperly designed. You're conflating two questions into one query, so you have a hacked-up query (on top of improperly-designed tables).

There are two issues with that statement. First, the tables were not improperly-designed. They had bad data in them. This is a very important distinction which goes to the heart of the matter. More on this in a moment.

As for the "improperly designed" queries, I'll stipulate that they are, but only because SQL is not relational. In reading the queries, someone new to SQL could easily conclude (we see this often enough) that those queries are logically equivalent. It's only SQL's poor support for the relational model which makes those query results different. In fact, Date goes on to show 12 different ways of writing that query (including yours) which return 9 different results but which would all return the same, correct result if operations on relations (tables, if you will) returned sets. In fact, one of the fundamental points of the relational model is that a relation (what we normally think of as a table) is an unordered set (in the mathematical sense) of tuples. Any operation on one or more relations must return another relation -- in other words, it must return a set. This includes selecting, inserting or updating information. If operations violate this, then the system implementing this operation is by definition not relational.

This gets back to the first point about the bad data in the tables. First, a set is a collection of unique items. There is no such thing as the set of integers (3,3,2). That should be the set of integers (3,2). Further, the union of (1,2,3) and (2,3,4) yields (1,2,3,4), not (1,2,2,3,3,4). SQL allows us to add duplicate tuples in tables. That's one of the reasons they're called "tables" and not "relations". Sure, I can put a unique constraint across all of the columns, but why should I? They should be unique in the first place. There's no reason those duplicates should exist and having SQL graciously allow us a constraint which should be there in the first place doesn't fix the problem (and I tell ya, it's real fun finding out that some maintenance programmer put a few thousand duplicate rows in a lookup table with no unique key to let me delete the extras).

So right off the bat, SQL allows us to violate the relational model when inserting records. Of course, one could argue that "that's just bad database design". Fine, let's stipulate that you won't enter duplicate rows.

Now, if you were to provide something in at least 3NF that demonstrates your point about the lack of relational theory, then I'll be glad to learn from you. But, don't start out with something that's not even 1NF, conclude that relational theory is completely ignored by RDBMS vendors, and expect to not be taken to task about it.

Again, the mere fact that SQL allows duplicate rows in tables means that SQL ignores the relational model and DBMS vendors are largely responsible for not providing us with tools which respect the model. However, if that's not enough to convince you ...

So a salesperson walks up to an accountant and says "Bob, for our customers with excellent credit ratings, what cities do they live in? I need to plan flights through those cities."

"What do you mean by 'excellent credit rating', Alice?"

"For the sake of argument, let's include everyone with a credit rating greater than 700."

"OK, let's see ... there's London. Oh, and there's London. And Paris. And London. London again, Athens, London, Paris, Moscow, London and Paris."

"That's a hell of a flight."

The answer Bob should have given Alice is "London, Paris, Moscow, and Athens". Repeating those city names over and over without reason doesn't make his assertion any more true. In fact, it almost makes the response non-sensical and if your accountant always answered questions that way, you'd think he's mental. But that's the sort of answer this query gives:

SELECT city
FROM   cities, customers
WHERE  customers.city_id = cities.city_id
  AND  customers.credit_rating > 700
[download]

That could easily be on a database in 3NF. There's nothing about the structure of the query which implies that the database is poorly designed.

Alice shouldn't have to tell Bob to give her "DISTINCT" cities because repeating the city names not only makes no sense, it helps to confuse the issue. SQL does not return sets by default. If it returns duplicates, it's not a set and thus the system isn't relational. Sets do not have duplicates. Now in the example above, if Alice is really stupid, she's going to have one hell of a long flight. Fortunately, Alice is not really stupid. Unfortunately, software is. If, instead of Alice, it's some software receiving the results of that query, things could go disastrously wrong.

If SQL followed relational theory, simple queries like this would return correct results. Instead, the SQL programmer has to remember that SQL will happily churn out garbage. The programmer is then put in the awkward position of having to either put "SELECT DISTINCT" everywhere or remember that "SELECT DISTINCT" can have horrible impacts on performance and thus should be used very selectively. If SQL was relational, the programmer would never have to worry about this.

What I'd like to hear, and I haven't, is a justification for the "not returning sets" behavior. If there any realistic justification one can give for queries to return duplicate information, I'd love to hear them. (The "but it's too slow otherwise" response is faulty because it confuses implementation with behavior.)

Cheers,
Ovid

New address of my CGI Course.

In reply to Re^2: (OT) Why SQL Sucks (with a little Perl to fix it) by Ovid
in thread (OT) Why SQL Sucks (with a little Perl to fix it) by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.