in reply to Re^2: DBI: passing undef as an argument
in thread DBI: passing undef as an argument

This is not nonsense.

SQL's use of "NULL" is inherently flawed as the relational model does not work with "NULL" values as implemented by SQL. Codd himself (in "The Relational Model for Database Management: Version 2) suggested that the SQL implementation of NULL was flawed and should be replaced by two distinct NULL-type markers. The markers he proposed were to stand for "Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. None of the common RDBMs in use do this and neither does any SQL standard.

Of course, the SQL standard is a flawed implementation of the relational model in any case, but as it is "the best" (meaning the only one we have), we will have to live with it.

To give an example of how flawed the use of "NULL" is, consider the following:

  1. A database table T with two columns: ID-number and Amount.
  2. The following query: SELECT ID-number, Amount FROM T WHERE Amount = 10 OR NOT(Amount = 10)
  3. This query should return ALL rows, since either the amount is equal to 10 or it is not. It does not matter whether the amount is actually unknown: it can only be "10" or "not 10". Yet this query will not return a row where "Amount = NULL".
Although the answer given by the SQL query is correct according to the truth tables of ternary ("three valued") logic, it is plainly wrong in the real world and one of the tenets of the relational database model is that you must be able to construct predicates on basis of the relation which are true in the real world. This is the so-called "Closed World Assumption": Everything stated by the database, either explicitly or implicitly, is true; everything else is false. The use of NULL as implemented by SQL breaks this assumption.

I agree that "NULL" comes in handy in using as a placeholder for an unknown value, but in many cases the use of "NULL" indicates that we have not been thinking the design of our database through in all its aspects and that something should be changed there.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^4: DBI: passing undef as an argument
by ikegami (Patriarch) on Aug 12, 2009 at 14:52 UTC

    You trying to prove it shouldn't be used by saying we should use two different types of it.

    You trying to prove it shouldn't be used by saying it's handy.

    You explain that it's flawed, but something being flawed shouldn't necessarily be avoided.

    To say the least, you haven't made your point.

      The two different types of "NULL" solve the problems the one "NULL" causes.

      It may be handy in its implemention, but it is heresy in the model.

      As the implementation of the relational model (i.e. SQL) is already flawed in several respects if one looks at the requirements of the relational model, you cannot mend it simply by avoiding to use "NULL", but neither should one compound the problems by an indiscriminate use of "NULL".

      I wasn't trying to "make a point", I just hoped to explain that the use of "NULL" is not free from problems.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re^4: DBI: passing undef as an argument
by ruzam (Curate) on Aug 09, 2009 at 02:27 UTC
    I have to disagree. Null as a place holder for an unknown value is a perfectly acceptable use of null. You've contrived an example that only shows how flawed the design was to allow null values into the database in the first place. It's not the 'null' or the 'select' that fails to count null values at fault, it's the table definition that allows for null values to be entered when none are expected to be there. If nulls are not part of your world then don't create tables that allow null values to be entered.
Re^4: DBI: passing undef as an argument
by dsheroh (Monsignor) on Aug 09, 2009 at 09:18 UTC
    It seems to me that the Closed World Assumption demands that your example must return the "plainly wrong" result that it does:

    This is the so-called "Closed World Assumption": Everything stated by the database, either explicitly or implicitly, is true; everything else is false.

    The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.

      The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.
      In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both. The amount is perhaps unknown (if that is what you mean to express by having a "NULL"-value in this field). But even if it is unknown, it has to be 10 or not 10 (but we do not know which one). So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result. The relational model cannot work with a "NULL"-value. If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both.

        My eldest son's age is neither 10 nor not 10. I don't have any children.

        If you were to make a list of everyone whose eldest son is either 10 or not-10 years old, I would be rightly excluded from that list, just as I would be if you queried select from monks where son_age = 10 or not son_age = 10 and my son_age were NULL.

        So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result.

        I would argue that the flaw in this case is not NULL, but rather the lack of higher-order logic which is capable of recognizing that "X or not X" should always be true. By the Closed World Assumption that you mentioned, X and not X must each individually be considered false when dealing with an unknown value and "false or false" is false.

        Even if we go to tri-state logic (or N-state), the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe". The only way to get a "true" result from "X or not X" when X is unknown is to recognize that one condition is the negation of the other and, therefore, one of them must be true even if we can't identify which it is. This has absolutely nothing to do with NULL, beyond NULL being a common way (probably the most common way) of indicating an unknown value.

        And that still wouldn't properly address cases such as the age of my son, where X and not-X really are both false, despite one being the logical negation of the other.

        If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

        I'm not particularly concerned with close adherence to the relational model, but I have already said a couple times here that things would be improved if there were distinct values for "UNKNOWN" and "EMPTY" rather than rolling both functions into NULL. You've mentioned your two desired unknowns as "Missing but Applicable" and "Missing but Inapplicable", which strike me as orthogonal to my two: I don't consider a value which is known to be nonexistent to be "missing" (although I suppose that might be the intended meaning of "missing but inapplicable") and your two still don't provide a clear way of indicating a value whose state is entirely unknown (it's definitely missing, but we don't know whether it's applicable or not).