comment on

(Below, when I speak about an RDBMS, I am speaking about Oracle9i. Almost every statement will be applicable to the majors, but they are correct for Oracle.)

First, the tables were not improperly-designed. They had bad data in them. This is a very important distinction which goes to the heart of the matter.

A table with bad data in it is an improperly-designed table. A properly designed table cannot, by virtue of it being properly designed, have bad data. It might have incorrect data, but the data cannot be bad. I will address your further points in the order you presented them, but this is a critical understanding of RDBMSes that you don't seem to have. (It's rare to see it in someone who hasn't been a DBA, so don't feel bad.)

Another critical understanding that you're missing - the "relational" part in an RDBMS isn't for the table - it's for both the row and how rows in different tables relate. All the columns in the row are related to the primary key column(s) and columns in different tables relate rows together. The "set" part comes in that the keys are unique, not that the results of a query are unique. A table is a set of primary keys, along with the values that they index into. The Perl analogy would be a hash. The keys of a hash are a set, but the values are not. This is a good thing.

No, SQL is not relational. If it was, it would be called RQL for "Relational Query Language," not "Structured Query Language." SQL was designed for Business Analysts, not programmers. It's a 4GL - one of those "natural language programming languages" that were touted in the 80's. That it's a complete failure both as a programming language and as a tool to work with databases is only to have been expected.

... a relation (what we normally think of as a table) is an unordered set (in the mathematical sense) of tuples. ... SQL allows us to add duplicate tuples in tables. That's one of the reasons they're called "tables" and not "relations". Sure, I can put a unique constraint across all of the columns, but why should I?

You're conflating so many different concepts here that I almost don't know where to start.

Most RDBMSes make a distinction between DML and DDL - the language one uses to query the tables (Data Manipulation Language), such as SQL, and the language one uses to define the tables (Data Definition Language). Neither of these are "relational." In fact, it doesn't make sense to talk about them as "relational." They can be used to interact with things that, if designed and built properly, are relational.
A table in a database doesn't make any sense unless it has a primary key. In fact, that is why the First Normal Form (or 1NF) is "Every row must have a primary key." Without that, you're absolutely correct in that the table is a bag and not a set. With that, the table is a set that defines a relation between the primary key column(s) and the dependent column(s). You don't need to have a unique constraint across all the columns, just across the one(s) that make up the primary key.
While you can think of the row as a tuple of all its coluns, it's more accurate to think of it as a function that, given a tuple for the primary column(s), returns a tuple for the dependent column(s). After all, a relation is nothing more than a function that maps items from one set to items in another set.
You are correct in that SQL will allow you to attempt to do anything you want to a given table. It's up to the table's designer(s) to have set the appropriate restrictions upon the table.
Tables are called tables and not relations because they aren't relations. The relation is within the row, not the table. (See above.)

As for Bob and Alice, there's a very good reason why SQL not only gives back that answer, but SHOULD give back that answer, and it comes down to the Principle of Least Surprise.^* Let's say you have the query SELECT foo FROM my_table; That returns back 10 rows. Under the current system, that means that there's 10 rows in my_table and here are the values in the foo column for each one. Under your idea, that means there are 10 distinct foos in my_table, regardless of how many rows.

Now, what hapens when I say SELECT foo, bar FROM my_table;? Under the current system, this still returns 10 rows. Under your system, it could return any number of rows, so long each row it returns is a unique (foo,bar) combination. For people that don't understand anything about set theory, that's somewhat counter-intuitive.

Believe me - I do understand your consternation. The idea that an operation upon a set returns a collection is counter-intuitive to those who understand any set theory at all. But, I point back to my earlier point - the set isn't the rows in the table taken as a whole - it's the range of the function between the primary key and the dependent columns. As long as your query returns the primary key, it is a set. If your query doesn't, then it doesn't returns a set.

To elucidate, let's look very quickly at what a SELECT does, under the hood. The basic operation, as you've noted, is to take a set (the PK tuple), apply a set of restrictions on it (either the PK tuple itself or the values tuple), then return the requested values from all rows that satisfy those restrictions. As long as your table(s) being queried was a set to begin with, the results will also be a set. However, your view of the results may or may not be a set, depending on what you've chosen to see.

Let's take your specific example:

SELECT cities.name
FROM   cities, customers
WHERE  customers.city_id = cities.city_id
  AND  customers.credit_rating > 700
[download]

I'm going to rewrite as:

SELECT (cities.name)
  FROM (customers JOIN cities USING (city_id))
 WHERE (customers.credit_rating > 700)
[download]

Assuming that customers and cities are both sets, then we have four sets involved in that query.

cities
customers
the resultset
the fromset

Your resultset is a set, believe it or not. It's the set of all rows from the intermediate table "customersXcities" that satisfies the criterion of customers.credit_rating > 700. This intermediate table has the same PK of your customers table. Think about that for a second - you're not querying the cities table directly. You're querying an intermediate table. Each row in your resultset is a row from that intermediate table. Then, the SELECT clause applies a set of vertical black strips to the resultset, each on top of a column. You turn off the strip by requesting that column's values be given to you. The SELECT clause isn't a set operation.

*: Don't give me that shocked look - you had to have known that the PoLS would make an appearance in this discussion. If only so that I could tweak you about how your surprise was anticipated before you were born. :-)

My criteria for good software:

Does it work?
Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

In reply to Re^3: (OT) Why SQL Sucks (with a little Perl to fix it) by dragonchild
in thread (OT) Why SQL Sucks (with a little Perl to fix it) by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.