in reply to DBI: passing undef as an argument

That is probably a blessing in disguise. There are very very good reasons never ever to allow a "NULL" (SQL's version of our undef) in any of the fields of a relational database. It can really screw up the logic of your code and give you unexpected (and wrong) results.

Ask yourself, what does a date with the value of "NULL" mean? Will it be equal, unequal, earlier or later than any other date? Can you do date arithmetic on it (what is "NULL" plus three days?)?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: DBI: passing undef as an argument
by ikegami (Patriarch) on Aug 08, 2009 at 15:25 UTC

    That's complete nonsense. There are plenty of good reasons to use NULL. For example, a NULL expiry date could mean there isn't an expiry date.

      This is not nonsense.

      SQL's use of "NULL" is inherently flawed as the relational model does not work with "NULL" values as implemented by SQL. Codd himself (in "The Relational Model for Database Management: Version 2) suggested that the SQL implementation of NULL was flawed and should be replaced by two distinct NULL-type markers. The markers he proposed were to stand for "Missing but Applicable" and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL. None of the common RDBMs in use do this and neither does any SQL standard.

      Of course, the SQL standard is a flawed implementation of the relational model in any case, but as it is "the best" (meaning the only one we have), we will have to live with it.

      To give an example of how flawed the use of "NULL" is, consider the following:

      1. A database table T with two columns: ID-number and Amount.
      2. The following query: SELECT ID-number, Amount FROM T WHERE Amount = 10 OR NOT(Amount = 10)
      3. This query should return ALL rows, since either the amount is equal to 10 or it is not. It does not matter whether the amount is actually unknown: it can only be "10" or "not 10". Yet this query will not return a row where "Amount = NULL".
      Although the answer given by the SQL query is correct according to the truth tables of ternary ("three valued") logic, it is plainly wrong in the real world and one of the tenets of the relational database model is that you must be able to construct predicates on basis of the relation which are true in the real world. This is the so-called "Closed World Assumption": Everything stated by the database, either explicitly or implicitly, is true; everything else is false. The use of NULL as implemented by SQL breaks this assumption.

      I agree that "NULL" comes in handy in using as a placeholder for an unknown value, but in many cases the use of "NULL" indicates that we have not been thinking the design of our database through in all its aspects and that something should be changed there.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        You trying to prove it shouldn't be used by saying we should use two different types of it.

        You trying to prove it shouldn't be used by saying it's handy.

        You explain that it's flawed, but something being flawed shouldn't necessarily be avoided.

        To say the least, you haven't made your point.

        I have to disagree. Null as a place holder for an unknown value is a perfectly acceptable use of null. You've contrived an example that only shows how flawed the design was to allow null values into the database in the first place. It's not the 'null' or the 'select' that fails to count null values at fault, it's the table definition that allows for null values to be entered when none are expected to be there. If nulls are not part of your world then don't create tables that allow null values to be entered.
        It seems to me that the Closed World Assumption demands that your example must return the "plainly wrong" result that it does:

        This is the so-called "Closed World Assumption": Everything stated by the database, either explicitly or implicitly, is true; everything else is false.

        The database in your example does not state, either explicitly or implicitly, that Amount is 10, therefore this is false. It also does not state that Amount is not 10, therefore this is also false.

Re^2: DBI: passing undef as an argument
by dsheroh (Monsignor) on Aug 08, 2009 at 16:35 UTC
    Ask yourself, what does a date with the value of "NULL" mean?

    Like undef, it either means that there isn't a date or that the date is unknown. This is, unfortunately, ambiguous, but can't really be resolved without adding a second "KNOWN BUT EMPTY" (non-)value, leaving NULL to specifically mean "unknown", since dates don't have any real equivalent to zero (for numbers) or empty strings (for text).

    Will it be equal, unequal, earlier or later than any other date?

    No, it will not be any of these, as you cannot know whether an unknown date is equal, unequal, earlier, or later than any other date.

    Can you do date arithmetic on it (what is "NULL" plus three days?)?

    No, you cannot calculate "an unknown date plus three days". The result will still be unknown.

    Attempting to determine whether a known-nonexistent date is before or after a given date, or the date three days after a known-nonexistent date, is similarly nonsensical. e.g., You can't say whether I was released from prison before or after my 25th birthday or what the date was three days after I met Richard Nixon because I've neither been to prison nor seen Nixon.

    If I recall correctly, SQL's semantics for handling NULL are consistent with my interpretation - any logical comparison to NULL will return false1 and NULL + 3 = NULL.

    1 Tri-state logic would allow comparisons to more properly return "unknown", but, in boolean logic, "true" and "false" are all we've got to work with and asserting that unknown results are implicitly true would be worse than considering them to be false.

      All your comments just strengthen the position that the use of "NULL" in SQL is flawed, since it has multiple meanings and there is no way to discern which of its meanings is to be considered.

      If there is ambiguity in your database, how can you ever trust its results?

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        The real world is often ambiguous. That's life. Would it be better if SQL supported two distinct values, NULL for "I don't know" and, say, EMPTY for "it is known to not exist"? Yeah, perhaps, but the ambiguity of having NULL serve both functions is decidedly less bad than deliberately inserting known-incorrect information1 just for the sake of having something there, regardless of whether it's meaningful (or accurate) or not.

        I believe that we should design software for the convenience of the end user, not the convenience of the computer. Data integrity is a good thing, but making it absolutely iron-clad in all cases is not worth placing unreasonable requirements on the user. (It is in some cases, but far from all.)

        Many years ago, I read a book2 which used the example of software which absolutely refuses to accept the entry of addresses lacking ZIP codes being considered acceptable, even though you would fire a secretary who would refuse to write down as much of an address as you know at the time, even if it's incomplete. I see little difference between banning addresses with unknown ZIP codes and banning NULL. In both cases you're demanding that the user must know everything about all the relevant data before allowing them to save any of it.

        Granted, you could both ban NULLs and allow incomplete entry by creating secondary boolean fields attached to every field in the table to indicate which of them contain meaningful data and which are just arbitrary values that are there purely for the sake of having something in there. But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form?

        1 e.g, Your suggestion of requiring an expiration date to be set even on records which don't expire, combined with a second boolean field whose purpose is essentially to say "ignore that other value - it's a lie".

        2 I think it was Alan Cooper's About Face, but I'm not positive.

        So how then you propose to handle undefined values in databases? What should we place into ExpiryDate column if record doesn't expire?