in reply to Re: DBI: passing undef as an argument
in thread DBI: passing undef as an argument

Ask yourself, what does a date with the value of "NULL" mean?

Like undef, it either means that there isn't a date or that the date is unknown. This is, unfortunately, ambiguous, but can't really be resolved without adding a second "KNOWN BUT EMPTY" (non-)value, leaving NULL to specifically mean "unknown", since dates don't have any real equivalent to zero (for numbers) or empty strings (for text).

Will it be equal, unequal, earlier or later than any other date?

No, it will not be any of these, as you cannot know whether an unknown date is equal, unequal, earlier, or later than any other date.

Can you do date arithmetic on it (what is "NULL" plus three days?)?

No, you cannot calculate "an unknown date plus three days". The result will still be unknown.

Attempting to determine whether a known-nonexistent date is before or after a given date, or the date three days after a known-nonexistent date, is similarly nonsensical. e.g., You can't say whether I was released from prison before or after my 25th birthday or what the date was three days after I met Richard Nixon because I've neither been to prison nor seen Nixon.

If I recall correctly, SQL's semantics for handling NULL are consistent with my interpretation - any logical comparison to NULL will return false1 and NULL + 3 = NULL.

1 Tri-state logic would allow comparisons to more properly return "unknown", but, in boolean logic, "true" and "false" are all we've got to work with and asserting that unknown results are implicitly true would be worse than considering them to be false.

Replies are listed 'Best First'.
Re^3: DBI: passing undef as an argument
by CountZero (Bishop) on Aug 08, 2009 at 21:53 UTC
    All your comments just strengthen the position that the use of "NULL" in SQL is flawed, since it has multiple meanings and there is no way to discern which of its meanings is to be considered.

    If there is ambiguity in your database, how can you ever trust its results?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      The real world is often ambiguous. That's life. Would it be better if SQL supported two distinct values, NULL for "I don't know" and, say, EMPTY for "it is known to not exist"? Yeah, perhaps, but the ambiguity of having NULL serve both functions is decidedly less bad than deliberately inserting known-incorrect information1 just for the sake of having something there, regardless of whether it's meaningful (or accurate) or not.

      I believe that we should design software for the convenience of the end user, not the convenience of the computer. Data integrity is a good thing, but making it absolutely iron-clad in all cases is not worth placing unreasonable requirements on the user. (It is in some cases, but far from all.)

      Many years ago, I read a book2 which used the example of software which absolutely refuses to accept the entry of addresses lacking ZIP codes being considered acceptable, even though you would fire a secretary who would refuse to write down as much of an address as you know at the time, even if it's incomplete. I see little difference between banning addresses with unknown ZIP codes and banning NULL. In both cases you're demanding that the user must know everything about all the relevant data before allowing them to save any of it.

      Granted, you could both ban NULLs and allow incomplete entry by creating secondary boolean fields attached to every field in the table to indicate which of them contain meaningful data and which are just arbitrary values that are there purely for the sake of having something in there. But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form?

      1 e.g, Your suggestion of requiring an expiration date to be set even on records which don't expire, combined with a second boolean field whose purpose is essentially to say "ignore that other value - it's a lie".

      2 I think it was Alan Cooper's About Face, but I'm not positive.

        I fully agree with you that the use of "NULL" is some|many times very convenient and not too bad, in a "if you squint your eyes a little and carefully design your database and the apllication that it uses" kind of way.

        BUT, that is only the implementation of the relational model and not the relational model itself. The relational model cannot live with single valued "NULL"-values. The practical implementations of the relational model (aka RDBMS) use "NULL" all the time and society did not crash, so it cannot be that bad. However, in my book that is not a good reason to indiscriminately use "NULL" to mean "unknown" if the database itself cannot tell you what "unknown" means. So you must rely on external information to disambiguate its meaning. For example: what does an "NULL" means in the "salary" field? "This person is salaried but we do not know his exact salary" or "This person is unemployed and has no salary" or "This person is salaried and we know his salary but he did not allow us to store this infirmation in a public database as this info is considered private and confidential"?

        And as for your example of the address and the ZIP-code, I can easily think of an application where the zip-code info is absolutely necessary for it to work (perhaps to calculate shipping costs?) and where a "NULL" would break the aplication. And yes, by its very definition we must know all relevant data. If you allow the user to proceed without having given all relevant data, then either your application is doing it wrong or you asked the user to provide irrelevant data (i.e. you asked him to provide data which you do not strictly need) and that is probably as bad.

        But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form?
        No, I did not. You can give the extra field a well defined meaning and this meaning is now internal to the database.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      So how then you propose to handle undefined values in databases? What should we place into ExpiryDate column if record doesn't expire?
        Add an extra field "Expires" with a type of "Boolean". If this field indicates the record does not expire, we just ignore whatever datetime value is in the "ExpiryDate" field.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James