Re^4: DBI: passing undef as an argument
by dsheroh (Monsignor) on Aug 09, 2009 at 09:08 UTC
|
The real world is often ambiguous. That's life. Would it be better if SQL supported two distinct values, NULL for "I don't know" and, say, EMPTY for "it is known to not exist"? Yeah, perhaps, but the ambiguity of having NULL serve both functions is decidedly less bad than deliberately inserting known-incorrect information1 just for the sake of having something there, regardless of whether it's meaningful (or accurate) or not.
I believe that we should design software for the convenience of the end user, not the convenience of the computer. Data integrity is a good thing, but making it absolutely iron-clad in all cases is not worth placing unreasonable requirements on the user. (It is in some cases, but far from all.)
Many years ago, I read a book2 which used the example of software which absolutely refuses to accept the entry of addresses lacking ZIP codes being considered acceptable, even though you would fire a secretary who would refuse to write down as much of an address as you know at the time, even if it's incomplete. I see little difference between banning addresses with unknown ZIP codes and banning NULL. In both cases you're demanding that the user must know everything about all the relevant data before allowing them to save any of it.
Granted, you could both ban NULLs and allow incomplete entry by creating secondary boolean fields attached to every field in the table to indicate which of them contain meaningful data and which are just arbitrary values that are there purely for the sake of having something in there. But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form?
1 e.g, Your suggestion of requiring an expiration date to be set even on records which don't expire, combined with a second boolean field whose purpose is essentially to say "ignore that other value - it's a lie".
2 I think it was Alan Cooper's About Face, but I'm not positive. | [reply] |
|
|
I fully agree with you that the use of "NULL" is some|many times very convenient and not too bad, in a "if you squint your eyes a little and carefully design your database and the apllication that it uses" kind of way.BUT, that is only the implementation of the relational model and not the relational model itself. The relational model cannot live with single valued "NULL"-values. The practical implementations of the relational model (aka RDBMS) use "NULL" all the time and society did not crash, so it cannot be that bad. However, in my book that is not a good reason to indiscriminately use "NULL" to mean "unknown" if the database itself cannot tell you what "unknown" means. So you must rely on external information to disambiguate its meaning. For example: what does an "NULL" means in the "salary" field? "This person is salaried but we do not know his exact salary" or "This person is unemployed and has no salary" or "This person is salaried and we know his salary but he did not allow us to store this infirmation in a public database as this info is considered private and confidential"? And as for your example of the address and the ZIP-code, I can easily think of an application where the zip-code info is absolutely necessary for it to work (perhaps to calculate shipping costs?) and where a "NULL" would break the aplication. And yes, by its very definition we must know all relevant data. If you allow the user to proceed without having given all relevant data, then either your application is doing it wrong or you asked the user to provide irrelevant data (i.e. you asked him to provide data which you do not strictly need) and that is probably as bad.
But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form? No, I did not. You can give the extra field a well defined meaning and this meaning is now internal to the database.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
|
For example: what does an "NULL" means in the "salary" field? "This person is salaried but we do not know his exact salary" or "This person is unemployed and has no salary" or "This person is salaried and we know his salary but he did not allow us to store this infirmation in a public database as this info is considered private and confidential"?
I agree that this is definitely a case where a separate no_salary_reason column would be highly appropriate, but, even so, I would still go with a non-value (i.e., NULL) in the salary column itself rather than inserting a fictitious value which would then doom me to never again being able to say WHERE salary... without adding a AND not salary_value_is_fictitious.
And as for your example of the address and the ZIP-code, I can easily think of an application where the zip-code info is absolutely necessary for it to work (perhaps to calculate shipping costs?) and where a "NULL" would break the aplication. And yes, by its very definition we must know all relevant data. If you allow the user to proceed without having given all relevant data, then either your application is doing it wrong or you asked the user to provide irrelevant data (i.e. you asked him to provide data which you do not strictly need) and that is probably as bad.
Yes and no... The one detail you neglected is the question of when the data is relevant. Unless you're shipping a package immediately, you don't actually need the ZIP code right now. Even if you do have something to ship, I should be able to enter as much of the address as I know at the moment, email the recipient to get his ZIP code, wait a day or two for him to reply, and then have the package ship as soon as I get his email and enter the ZIP code. Allowing the user to proceed without entering all relevant data is not necessarily doing it wrong.
But that was just a side example which is even further OT than the discussion of whether NULL is a good or a bad thing...
| [reply] [d/l] [select] |
|
|
|
|
Re^4: DBI: passing undef as an argument
by zwon (Abbot) on Aug 08, 2009 at 22:12 UTC
|
So how then you propose to handle undefined values in databases? What should we place into ExpiryDate column if record doesn't expire? | [reply] |
|
|
Add an extra field "Expires" with a type of "Boolean". If this field indicates the record does not expire, we just ignore whatever datetime value is in the "ExpiryDate" field.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
|
| [reply] |
|
|
|
|
SELECT *
FROM T
WHERE HasAmount
AND ( Amount = 10 OR NOT (Amount = 10) )
Provided Field:
SELECT *
FROM T
WHERE NOT IsNULL(Amount)
AND ( Amount = 10 OR NOT (Amount = 10) )
| [reply] [d/l] [select] |
|
|
|
|