Re^7: DBI: passing undef as an argument

My eldest son's age is neither 10 nor not 10. I don't have any children.
In which case the database would have a field "number_of_children", next to "age_of_oldest_child". If you fill in "0" for number_of_children the database application should disregard whatever is in "age_of_oldest_child (and/or would provide a suitable default value, such as zero or 999 or whatever, but not "NULL"). This is the only way to differentiate between someone having no children and someone who just forgot to enter an age (in which case that latter person will repeatedly be asked to enter the age information, until it is done). The value "NULL" cannot make this distinction by itself. Thank you for this argument, it is a good one to prove my point.

the truth values of "unknown number is 10" and "unknown number is not 10" are both "maybe" and the result of "maybe or maybe" is another "maybe".

So "NULL" can mean "Unknown data", "Missing data" or "Maybe" and there is no way to know what is actually meant. How can you ever be certain that the results you ask from the database are correct? BTW, someone's age can never be "maybe 10 years". Perhaps you don't know his age, but it is never "maybe 10". The question was: "Select all people in the database whose age is 10 or whose age is not 10". You are trying to answer a different question: "Select all people in the database of which we are certain their age is 10 or of which we are certain their age is not 10". If you allow "NULL" in the age field, at best you can only answer the second question and then only if you somehow define "NULL" to mean "We are uncertain of the age", but that definition of "NULL" is information you cannot store in the database (so you then allow rules that govern the use of the database to be external of it, which is bad). If you want to implement a "of which we are certain" part in the question you really need a separate (boolean) field ("Age known?"), but never a "NULL".

The relational model uses classical ("Aristotelian") logic, so you cannot have X and non-X both be true or false at the same time. The value of "NULL" as not being TRUE nor FALSE at the same time, simply has no place in the relational model. BUT, when I speak about the relational model, I speak about the theoretical basis of SQL of which the present RDBMS are implementations (or rather second order implementations, "SQL" being the first order implemenation) and all these implementations have severe flaws as regards fully following the model. That doesn't mean however that we should not try to remain as close to the relational model as we can and not compound the errors of the implementations even more by a haphazard use of "NULL". That was really all I meant to say with my initial answer to the OP.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Comment on Re^7: DBI: passing undef as an argument

Replies are listed 'Best First'.
Re^8: DBI: passing undef as an argument by ruzam (Curate) on Aug 11, 2009 at 02:57 UTC
In which case the database would have a field "number_of_children", next to "age_of_oldest_child". And what if the number_of_children is unknown? Do you create yet another field for this? Do you keep qualifying your missing data definition until you run out of physical fields to add to the table? The fact that you have defined the field as containing nulls to begin with implies that the field is no longer just a value, but a value and a state pair. You can't check the value without ALSO checking the state, regardless of what NULL may be used to signify. Null might just as well indicate that a joined table has no matching row, which has nothing to do with the values or the reason they're stored. If you're going to argue that null has no place in the relational model, then you're going to have to support an argument that JOIN has no place in the relational model first. How can you ever be certain that the results you ask from the database are correct? By asking for results that can be certain. "Select all people in the database who's age is not 10 or who's age IS NULL" just as you would "Select all people in the database who's age is not 10 or who's number_of_children is 0". I don't see any inconsistency here. If you chose to use nulls, then you MUST test for nulls or accept the consequences. Your argument appears to claim that testing values fails when nulls are involved but from my point of view, your argument fails to accept that the null state needs to be part of the test. It's not the "Aristotelian" logic that's been subverted, but the test which is incomplete. I think the real point of your argument may be that working with null values is prone to error and should not be used without careful consideration of the traps that can result. If this is so, then I heartily agree. It's far easier to simply avoid nulls then it is to try to stay on top of situations where they may misbehave. And if you do any DB2 interaction through COBOL you'll appreciate not having to code in (and check) the extra 'null indicator' flags ;)	[reply]

Replies are listed 'Best First'.

Re^8: DBI: passing undef as an argument
by ruzam (Curate) on Aug 11, 2009 at 02:57 UTC

In which case the database would have a field "number_of_children", next to "age_of_oldest_child".

And what if the number_of_children is unknown? Do you create yet another field for this? Do you keep qualifying your missing data definition until you run out of physical fields to add to the table? The fact that you have defined the field as containing nulls to begin with implies that the field is no longer just a value, but a value and a state pair. You can't check the value without ALSO checking the state, regardless of what NULL may be used to signify. Null might just as well indicate that a joined table has no matching row, which has nothing to do with the values or the reason they're stored. If you're going to argue that null has no place in the relational model, then you're going to have to support an argument that JOIN has no place in the relational model first.

How can you ever be certain that the results you ask from the database are correct?

By asking for results that can be certain. "Select all people in the database who's age is not 10 or who's age IS NULL" just as you would "Select all people in the database who's age is not 10 or who's number_of_children is 0". I don't see any inconsistency here. If you chose to use nulls, then you MUST test for nulls or accept the consequences. Your argument appears to claim that testing values fails when nulls are involved but from my point of view, your argument fails to accept that the null state needs to be part of the test. It's not the "Aristotelian" logic that's been subverted, but the test which is incomplete.

I think the real point of your argument may be that working with null values is prone to error and should not be used without careful consideration of the traps that can result. If this is so, then I heartily agree. It's far easier to simply avoid nulls then it is to try to stay on top of situations where they may misbehave.

And if you do any DB2 interaction through COBOL you'll appreciate not having to code in (and check) the extra 'null indicator' flags ;)

[reply]