in reply to Re^4: DBI: passing undef as an argument
in thread DBI: passing undef as an argument

The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.
In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both. The amount is perhaps unknown (if that is what you mean to express by having a "NULL"-value in this field). But even if it is unknown, it has to be 10 or not 10 (but we do not know which one). So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result. The relational model cannot work with a "NULL"-value. If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

  • Comment on Re^5: DBI: passing undef as an argument

Replies are listed 'Best First'.
Re^6: DBI: passing undef as an argument
by dsheroh (Monsignor) on Aug 10, 2009 at 09:31 UTC
    In the real world, you cannot have an amount that is at the same time not 10 and not not 10. It has to be one of both.

    My eldest son's age is neither 10 nor not 10. I don't have any children.

    If you were to make a list of everyone whose eldest son is either 10 or not-10 years old, I would be rightly excluded from that list, just as I would be if you queried select from monks where son_age = 10 or not son_age = 10 and my son_age were NULL.

    So a query that asks for all rows where the amount is 10 or the amount is not 10, must return all rows. SQL does not return the rows where the amount is "NULL" and that is where it is flawed and gives a wrong result.

    I would argue that the flaw in this case is not NULL, but rather the lack of higher-order logic which is capable of recognizing that "X or not X" should always be true. By the Closed World Assumption that you mentioned, X and not X must each individually be considered false when dealing with an unknown value and "false or false" is false.

    Even if we go to tri-state logic (or N-state), the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe". The only way to get a "true" result from "X or not X" when X is unknown is to recognize that one condition is the negation of the other and, therefore, one of them must be true even if we can't identify which it is. This has absolutely nothing to do with NULL, beyond NULL being a common way (probably the most common way) of indicating an unknown value.

    And that still wouldn't properly address cases such as the age of my son, where X and not-X really are both false, despite one being the logical negation of the other.

    If you want to have a value for "unknown", you must have TWO values for unknown (as per CODD).

    I'm not particularly concerned with close adherence to the relational model, but I have already said a couple times here that things would be improved if there were distinct values for "UNKNOWN" and "EMPTY" rather than rolling both functions into NULL. You've mentioned your two desired unknowns as "Missing but Applicable" and "Missing but Inapplicable", which strike me as orthogonal to my two: I don't consider a value which is known to be nonexistent to be "missing" (although I suppose that might be the intended meaning of "missing but inapplicable") and your two still don't provide a clear way of indicating a value whose state is entirely unknown (it's definitely missing, but we don't know whether it's applicable or not).

      My eldest son's age is neither 10 nor not 10. I don't have any children.
      In which case the database would have a field "number_of_children", next to "age_of_oldest_child". If you fill in "0" for number_of_children the database application should disregard whatever is in "age_of_oldest_child (and/or would provide a suitable default value, such as zero or 999 or whatever, but not "NULL"). This is the only way to differentiate between someone having no children and someone who just forgot to enter an age (in which case that latter person will repeatedly be asked to enter the age information, until it is done). The value "NULL" cannot make this distinction by itself. Thank you for this argument, it is a good one to prove my point.

      the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe".
      So "NULL" can mean "Unknown data", "Missing data" or "Maybe" and there is no way to know what is actually meant. How can you ever be certain that the results you ask from the database are correct? BTW, someone's age can never be "maybe 10 years". Perhaps you don't know his age, but it is never "maybe 10". The question was: "Select all people in the database whose age is 10 or whose age is not 10". You are trying to answer a different question: "Select all people in the database of which we are certain their age is 10 or of which we are certain their age is not 10". If you allow "NULL" in the age field, at best you can only answer the second question and then only if you somehow define "NULL" to mean "We are uncertain of the age", but that definition of "NULL" is information you cannot store in the database (so you then allow rules that govern the use of the database to be external of it, which is bad). If you want to implement a "of which we are certain" part in the question you really need a separate (boolean) field ("Age known?"), but never a "NULL".

      The relational model uses classical ("Aristotelian") logic, so you cannot have X and non-X both be true or false at the same time. The value of "NULL" as not being TRUE nor FALSE at the same time, simply has no place in the relational model. BUT, when I speak about the relational model, I speak about the theoretical basis of SQL of which the present RDBMS are implementations (or rather second order implementations, "SQL" being the first order implemenation) and all these implementations have severe flaws as regards fully following the model. That doesn't mean however that we should not try to remain as close to the relational model as we can and not compound the errors of the implementations even more by a haphazard use of "NULL". That was really all I meant to say with my initial answer to the OP.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        In which case the database would have a field "number_of_children", next to "age_of_oldest_child".

        And what if the number_of_children is unknown? Do you create yet another field for this? Do you keep qualifying your missing data definition until you run out of physical fields to add to the table? The fact that you have defined the field as containing nulls to begin with implies that the field is no longer just a value, but a value and a state pair. You can't check the value without ALSO checking the state, regardless of what NULL may be used to signify. Null might just as well indicate that a joined table has no matching row, which has nothing to do with the values or the reason they're stored. If you're going to argue that null has no place in the relational model, then you're going to have to support an argument that JOIN has no place in the relational model first.

        How can you ever be certain that the results you ask from the database are correct?

        By asking for results that can be certain. "Select all people in the database who's age is not 10 or who's age IS NULL" just as you would "Select all people in the database who's age is not 10 or who's number_of_children is 0". I don't see any inconsistency here. If you chose to use nulls, then you MUST test for nulls or accept the consequences. Your argument appears to claim that testing values fails when nulls are involved but from my point of view, your argument fails to accept that the null state needs to be part of the test. It's not the "Aristotelian" logic that's been subverted, but the test which is incomplete.

        I think the real point of your argument may be that working with null values is prone to error and should not be used without careful consideration of the traps that can result. If this is so, then I heartily agree. It's far easier to simply avoid nulls then it is to try to stay on top of situations where they may misbehave.

        And if you do any DB2 interaction through COBOL you'll appreciate not having to code in (and check) the extra 'null indicator' flags ;)