in reply to Re^4: DBI: passing undef as an argument
in thread DBI: passing undef as an argument

I fully agree with you that the use of "NULL" is some|many times very convenient and not too bad, in a "if you squint your eyes a little and carefully design your database and the apllication that it uses" kind of way.

BUT, that is only the implementation of the relational model and not the relational model itself. The relational model cannot live with single valued "NULL"-values. The practical implementations of the relational model (aka RDBMS) use "NULL" all the time and society did not crash, so it cannot be that bad. However, in my book that is not a good reason to indiscriminately use "NULL" to mean "unknown" if the database itself cannot tell you what "unknown" means. So you must rely on external information to disambiguate its meaning. For example: what does an "NULL" means in the "salary" field? "This person is salaried but we do not know his exact salary" or "This person is unemployed and has no salary" or "This person is salaried and we know his salary but he did not allow us to store this infirmation in a public database as this info is considered private and confidential"?

And as for your example of the address and the ZIP-code, I can easily think of an application where the zip-code info is absolutely necessary for it to work (perhaps to calculate shipping costs?) and where a "NULL" would break the aplication. And yes, by its very definition we must know all relevant data. If you allow the user to proceed without having given all relevant data, then either your application is doing it wrong or you asked the user to provide irrelevant data (i.e. you asked him to provide data which you do not strictly need) and that is probably as bad.

But, at that point, haven't you basically just reinvented NULL in a non-standard and (much) less convenient form?
No, I did not. You can give the extra field a well defined meaning and this meaning is now internal to the database.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

  • Comment on Re^5: DBI: passing undef as an argument

Replies are listed 'Best First'.
Re^6: DBI: passing undef as an argument
by dsheroh (Monsignor) on Aug 10, 2009 at 09:53 UTC
    For example: what does an "NULL" means in the "salary" field? "This person is salaried but we do not know his exact salary" or "This person is unemployed and has no salary" or "This person is salaried and we know his salary but he did not allow us to store this infirmation in a public database as this info is considered private and confidential"?

    I agree that this is definitely a case where a separate no_salary_reason column would be highly appropriate, but, even so, I would still go with a non-value (i.e., NULL) in the salary column itself rather than inserting a fictitious value which would then doom me to never again being able to say WHERE salary... without adding a AND not salary_value_is_fictitious.

    And as for your example of the address and the ZIP-code, I can easily think of an application where the zip-code info is absolutely necessary for it to work (perhaps to calculate shipping costs?) and where a "NULL" would break the aplication. And yes, by its very definition we must know all relevant data. If you allow the user to proceed without having given all relevant data, then either your application is doing it wrong or you asked the user to provide irrelevant data (i.e. you asked him to provide data which you do not strictly need) and that is probably as bad.

    Yes and no... The one detail you neglected is the question of when the data is relevant. Unless you're shipping a package immediately, you don't actually need the ZIP code right now. Even if you do have something to ship, I should be able to enter as much of the address as I know at the moment, email the recipient to get his ZIP code, wait a day or two for him to reply, and then have the package ship as soon as I get his email and enter the ZIP code. Allowing the user to proceed without entering all relevant data is not necessarily doing it wrong.

    But that was just a side example which is even further OT than the discussion of whether NULL is a good or a bad thing...

      rather than inserting a fictitious value which would then doom me to never again being able to say WHERE salary... without adding a AND not salary_value_is_fictitious
      You do not need a "salary_value_is_fictitious" field but a field which says why you cannot use the "salary" field. It is exactly my argument that "NULL" is ambiguous in that respect. If your database model somehow allows for the salary field to be nonsense (perhaps because the database includes unsalaried people) then your model must somehow cater for it and allowing "NULL" is by far the worst way to do so, as by its very essence it cannot contain any meaning in and by itself: it is --in a way-- the very absence of meaning.

      Even if you do have something to ship, I should be able to enter as much of the address as I know at the moment, email the recipient to get his ZIP code, wait a day or two for him to reply, and then have the package ship as soon as I get his email and enter the ZIP code. Allowing the user to proceed without entering all relevant data is not necessarily doing it wrong.
      In that case, just enter nothing in the ZIP-code field and by nothing I mean the "empty string" which is different from NULL (which is more like "undef").

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        You do not need a "salary_value_is_fictitious" field but a field which says why you cannot use the "salary" field.

        Nonetheless, it is the same doom I mentioned earlier, even if the flag also indicates why the salary is fictitious. You will still never be able to just say WHERE salary < 100000. With a separate flag field it needs to be WHERE salary < 100000 AND unusable_salary_reason = 0. And heaven help you if someone unaware of this scheme (or just forgetful) runs a query without checking the unusable_salary_reason and takes action based on any fictitious values that appear valid if you look at salary in isolation but are actually flagged as unusable in the other column... (Granted, NULL has odd semantics which can catch the uninitiated by surprise, but they are standardized, so they're much less prone to this kind of problem.)

        You are correct that allowing NULL often (not always, but often) calls for a second field to clarify what the NULL means. My basic point of disagreement here is that NULL allows you to indicate within the salary field itself that its value is unusable, while the way you've described of doing it without allowing NULL requires you to check a second field every time you access the salary so that you can determine whether the value claimed by the salary field is usable or not.

        allowing "NULL" is by far the worst way to do so, as by its very essence it cannot contain any meaning in and by itself: it is --in a way-- the very absence of meaning.

        I like that. "The very absence of meaning" is a good description of NULL and the reason why I support its use: If the value of a field is meaningless for a given record, then NULL allows you to indicate this fact within the field itself. This does not in any way prevent you from also using a second field to indicate the reason why it's NULL.