in reply to Re^3: DBI: passing undef as an argument
in thread DBI: passing undef as an argument

It seems to me that the Closed World Assumption demands that your example must return the "plainly wrong" result that it does:

This is the so-called "Closed World Assumption": Everything stated by the database, either explicitly or implicitly, is true; everything else is false.

The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.

  • Comment on Re^4: DBI: passing undef as an argument

Replies are listed 'Best First'.
Re^5: DBI: passing undef as an argument
by CountZero (Bishop) on Aug 09, 2009 at 13:45 UTC
    The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.
    In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both. The amount is perhaps unknown (if that is what you mean to express by having a "NULL"-value in this field). But even if it is unknown, it has to be 10 or not 10 (but we do not know which one). So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result. The relational model cannot work with a "NULL"-value. If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both.

      My eldest son's age is neither 10 nor not 10. I don't have any children.

      If you were to make a list of everyone whose eldest son is either 10 or not-10 years old, I would be rightly excluded from that list, just as I would be if you queried select from monks where son_age = 10 or not son_age = 10 and my son_age were NULL.

      So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result.

      I would argue that the flaw in this case is not NULL, but rather the lack of higher-order logic which is capable of recognizing that "X or not X" should always be true. By the Closed World Assumption that you mentioned, X and not X must each individually be considered false when dealing with an unknown value and "false or false" is false.

      Even if we go to tri-state logic (or N-state), the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe". The only way to get a "true" result from "X or not X" when X is unknown is to recognize that one condition is the negation of the other and, therefore, one of them must be true even if we can't identify which it is. This has absolutely nothing to do with NULL, beyond NULL being a common way (probably the most common way) of indicating an unknown value.

      And that still wouldn't properly address cases such as the age of my son, where X and not-X really are both false, despite one being the logical negation of the other.

      If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

      I'm not particularly concerned with close adherence to the relational model, but I have already said a couple times here that things would be improved if there were distinct values for "UNKNOWN" and "EMPTY" rather than rolling both functions into NULL. You've mentioned your two desired unknowns as "Missing but Applicable" and "Missing but Inapplicable", which strike me as orthogonal to my two: I don't consider a value which is known to be nonexistent to be "missing" (although I suppose that might be the intended meaning of "missing but inapplicable") and your two still don't provide a clear way of indicating a value whose state is entirely unknown (it's definitely missing, but we don't know whether it's applicable or not).

        My eldest son's age is neither 10 nor not 10. I don't have any children.
        In which case the database would have a field "number_of_children", next to "age_of_oldest_child". If you fill in "0" for number_of_children the database application should disregard whatever is in "age_of_oldest_child (and/or would provide a suitable default value, such as zero or 999 or whatever, but not "NULL"). This is the only way to differentiate between someone having no children and someone who just forgot to enter an age (in which case that latter person will repeatedly be asked to enter the age information, until it is done). The value "NULL" cannot make this distinction by itself. Thank you for this argument, it is a good one to prove my point.

        the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe".
        So "NULL" can mean "Unknown data", "Missing data" or "Maybe" and there is no way to know what is actually meant. How can you ever be certain that the results you ask from the database are correct? BTW, someone's age can never be "maybe 10 years". Perhaps you don't know his age, but it is never "maybe 10". The question was: "Select all people in the database whose age is 10 or whose age is not 10". You are trying to answer a different question: "Select all people in the database of which we are certain their age is 10 or of which we are certain their age is not 10". If you allow "NULL" in the age field, at best you can only answer the second question and then only if you somehow define "NULL" to mean "We are uncertain of the age", but that definition of "NULL" is information you cannot store in the database (so you then allow rules that govern the use of the database to be external of it, which is bad). If you want to implement a "of which we are certain" part in the question you really need a separate (boolean) field ("Age known?"), but never a "NULL".

        The relational model uses classical ("Aristotelian") logic, so you cannot have X and non-X both be true or false at the same time. The value of "NULL" as not being TRUE nor FALSE at the same time, simply has no place in the relational model. BUT, when I speak about the relational model, I speak about the theoretical basis of SQL of which the present RDBMS are implementations (or rather second order implementations, "SQL" being the first order implemenation) and all these implementations have severe flaws as regards fully following the model. That doesn't mean however that we should not try to remain as close to the relational model as we can and not compound the errors of the implementations even more by a haphazard use of "NULL". That was really all I meant to say with my initial answer to the OP.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James