in reply to (OT) Why SQL Sucks (with a little Perl to fix it)

The fact that DISTINCT is required to deal with tables (or DBMS's) which permit multisets shouldn't be a surprise, and hardly seems a cause for complaint. I could likewise complain that a Perl hash will destroy the old value when a tuple with a duplicate key is added; hence, "Perl sucks". All languages have behavioral limits. In your example, its a very logical behavior, given the operational criteria (i.e., using a DBMS that permits multisets).

You may want to visit Database Debunkings on a regular basis. Mssr. Pascal is a well known critic (crank?) about SQL, so you may find a comrade to support your contentions.

For added fun, read why NULLs cause problems in databases.

As opposed to "why NULLs (aka undef's) never cause problems in <Perl | C | Java | ... >" ? NULL represents a state. It can be inconvenient, but using a DBMS that enforces integrity, and then using that capability, can often limit your exposure to the issue...but NULL exists for a reason (now theres a non sequitar!).

As to your update, I don't really see the parallel. If you've designed a database which doesn't use unique keys, or don't bother to apply the complete key, then you'll likely get into trouble. Just as if you use a Perl hash with non-unique keys.

I should declare that I'm solidly in the SQL fanboy camp (as if that wasn't obvious). I've been using SQL for 20 years, nearly as long as I've used C or Pascal, and a lot longer than Java or Perl. More importantly, I've used it on some very large (multiterabyte, giga-rowset) datasets and very complex problems. Fortunately, I've used it primarily on a DBMS with a very robust/feature rich SQL dialect.

Most complaints I hear regarding SQL can usually be classified in one of the following categories:

(The best clue that either of these issues exists is a quick scan of their code: it usually consists of lots of row-at-a-time cursor operations in nested loops.)

I noted that your examples were based on SQLite. While SQLite is convenient, it is hardly an example of a robust, feature rich SQL or relational algebra implementation. While its nice that you found a reason to use Perl to "fix" it, it might be a better idea in practice to use a more robust DBMS. I could likewise post an article titled, "Perl sucks (and here's some C to fix it)", and bury the fact that I'm trying to implement a realtime high frequency DSP algorithm in Perl. Not exactly an appropriate title. Perhaps you should update your OP title to "SQLite sucks at handling duplicate rows (and some Perl to fix it)" ?

  • Comment on Re: (OT) Why SQL Sucks (with a little Perl to fix it)

Replies are listed 'Best First'.
Re^2: (OT) Why SQL Sucks (with a little Perl to fix it)
by BrowserUk (Patriarch) on Dec 11, 2005 at 04:11 UTC
    The complainer lacks a full understanding or appreciation of set and relational theory

    The problem with SQL is that it requires "a full understanding or appreciation of set and relational theory" to use it properly, but offers itself to the world through it's "friendly" declarative syntax.

    It encourages people to think that they are competent because their queries return the results they expect. What they often do not understand is the costs that it went through to produce them, nor how much their query was optimised for them under the covers (in the case of the more sophisticated implementations--usually the commercial ones).

    Perl has a similar problem in that it's accessibility make it easy to 'get something to go' for the beginner without grounding, but without the grounding, they do not appreciate the redundancies and inefficiencies in how they get the results.

    SQL as a language is ill-fitting to the operations of set theory, with multiple, verbose, and often awkward and ambiguous ways of specifying relationships that would be concise and unambiguous in set notation. Many years ago I was trying to optimise a complex select across a 7-way join of some pretty large tables (for the time!). Consulting a DB2 expert from IBM he showed us an APL statement that did the equivalent selection; it consisted of about 15 to 20 characters. In total.

    If you think I'm kidding, go here and scroll down to table under the heading "How Does APL Compare with other Languages".

    I never learnt APL, but I've often wondered what could be achieved with if RDBMSs would accept it as an alternate syntax. Given that Unicode now supports the APL character set, maybe it's time for APL's resurrection :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      It appears your argument is at odds with itself. You contend that SQL requires a full understanding of set theory, and then contend it doesn't. My initial statement was made in a "wholistic" context (sorry for the cliche). If one intends to use multiset tables, they need to understand the issues it presents, and the syntax elements available to address those issues. But it is possible to craft data models which don't permit multisets, and hence SQL users can be blissfully ignorant of the multiset issue. Likewise with NULLs: either add constraints to disallow them, or translate them to some other value which has a more specific meaning.

      Perhaps the real issue isn't SQL, but rather the myriad ways in which data models can be constructed. There are reasons to implement data models in one manner or another, depending on the "profile" of the problem set it addresses, balanced by performance and resource costs, and the skills of the user community. The fact there isn't a "one true" and simple SQL syntax to accomodate every combination of data model requirements shouldn't be surprising. (Of course, proper application of views can mitigate the need for "SQL savvy" users).

      As to "incorrect" resultsets, I'd contend there is no such thing. There are, however,

      • poorly formed queries
      • badly designed data models
      • poorly cleansed databases

      I never learnt APL, but I've often wondered what could be achieved with if RDBMSs would accept it as an alternate syntax.

      I actually did learn APL waaaaay back in the long ago time, and it was a very intriguing language. I can imagine implementing very complex queries with a handful of characters. However, if you think SQL is hard to learn to use properly with complex data models, then APL will likely make some heads explode. Also, APL's arcane albeit concise syntax doesn't solve the problems inherent in dealing with large datasets built for multiple purposes and large, diverse user communities. I'd assert that an APL solution would also require an optimizer under those circumstances, unless one has the luxury of permuting and storing every index tuple within the dataset, in which case no optimizer is needed (other than a crafty APL coder with a deep understanding of the data model). APL by itself doesn't make disks spin faster, nor CPU's compute faster, nor RAM to automatically expand.

      As to the complaint regarding SQL's fitness, what particular elements do you take issue with ? JOIN ? UNION ? INTERSECT ? DIFFERENCE ? Subqueries to derive subsets ? CASE ? What elements are ambiguous ? What you perceive as verbose, I see as "readable" (and I've read some *very* large queries w/ 16+ joins, and numerous complex subqueries with several OLAP clauses in them). (FWIW, I usually eschew more "punctuation" driven Perl solutions in preference to readable equivalents). While GROUP BY/HAVING have been a particular nemesis of mine over the years, I've come to assimilate their usage, just like obtuse elements of other languages.

      If SQL is such a poor language, why does it continue to be used ? Wouldn't Darwinism have long ago found a replacement to overtake it ? (I'm aware there are contenders, but they haven't seemed to gain much of a foothold). E.g., Java managed to rival (and perhaps displace) C++ in a relatively short time - IMHO, for good reasons. For that matter, Python and Ruby have made significant entry into the Perl user base. Yet SQL is nearly 30 years old, and it supplanted other contenders, e.g. QUEL, QBE. If SQL, and the current crop of RDBMS's are so bad, why hasn't someone built the replacement and become famous and fabulously wealthy ?

        It appears your argument is at odds with itself. You contend that SQL requires a full understanding of set theory, and then contend it doesn't.

        No. I agreed with you that SQL requires a full understanding of set and relational theory to use properly.

        I contend that it's English-like syntax makes it possible to do simple things easily; but that the use of English terminology that is frequently, many times overloaded with inexact colloquial, cultural and "commonplace" inferences and meanings, makes it easy for people to believe they understand what the keywords mean and do. And for simple queries, they get the results they expect, because the optimisers tidy up their misunderstandings.

        How many times have you heard

        When I use this clause in this query, it produces these results, which is what I was after. Therefore I understand what that clause does.

        However, when I try to incorporate that clause into this query, it should produce these results, but it doesn't?

        It is the combination of optimisers that quietly clean up badly phrased simple queries, and the ambiguity of terminology that overloads terms that have common-parlance meanings with mathematical concepts, that makes SQL syntax easy to misuse.

        Add to that the myriad ways in which datamodels can be constructed; the myriad variations of the syntax used to do that; the myriad variations in query syntax (NATURAL/INNER/OUTER/LEFT INNER/LEFT OUTER JOINs -versus- UNION/INTERSECT/DIFFERENCE etc.); and myriad (incompatible) proprietary extensions. The list goes on.


        APL by itself doesn't make disks spin faster, nor CPU's compute faster, nor RAM to automatically expand.

        Agreed. But the mathematically precise nature of the notation makes it a good fit with requirements of specifying precise set manipulations. To quote a little from the APL link I posted:

        English (or any other language) has evolved over centuries as the best form of oral and written communication between people. But there are better ways for communicating between people and computers and the APL symbol set seems to be the most productive system yet devised.

        Professor H. M. Courtney, University of Texas at Arlington, has suggested that English (this applies to other languages too) is not always the best language for legal communication. He proposes that APL be used as a statutory language to supplement English when quantitative and logical relationships are involved.

        He suggests that "APL contains an exceedingly rich collection of primitive functions which might be adequate to express clearly and comprehensively legislative intent", (where quantitative and logical relationships are involved), and that APL be used to supplement the usual English language legal statements to provide clarity and brevity.

        A direct benefit from such a practice would be that the APL statements would also be an actual computer program which could be used to test whether the statements give the expected results from sample data.


        What you perceive as verbose, I see as "readable" ... (FWIW, I usually eschew more "punctuation" driven Perl solutions in preference to readable equivalents).

        Another couple of quotes

        The extensive use of symbols (or shapes) for company advertising purposes; for communication of information in public places; for traffic signs; etc., shows that our culture is a symbol-oriented and that people take easily to this use.

        Some people believe that the ultimate goal is to program computers in English. However, experience to date shows that symbols may be better than English for specific cases of communication. For example, consider the programming and documentation of music. Can you imagine trying to write the score for Jingle Bells in words? "Three E's, the third one held twice as long as either of the other two; repeat; etc...!"

        Anyone who wishes can replace any or all of the APL primitive function symbols with words, quite easily. The simple algebra symbols would then become "Add", "Subtract", "Multiply", "Divide" (as in COBOL) instead of +, -, x, -. ... The point is, in APL you can select your own level of verbosity, whereas it is preordained in other computer languages.

        Can the 1/3rd of the world's population that use symbolic language notation be totally wrong? :)


        Another problem with SQL is that it is either impossible, or very hard to write reusable routines. In general(*), the lack of formal/actual arguments in PROCEDURE definitions means that whatever the procedure does, it will only do on the table(s) and column(s) hard coded within it.

        *Some proprietary extensions may allow this; and it is possible to simulate it to some extent by preselecting into a view and then writing the procedure in terms of the view, but that would like having to do you own formal/actual parameter mapping in a modern GP language. Oh wait... :0

        Contrast that with

        In fact, the APL language has about 100 primitive symbolic functions. The ability to pre-store independently existing variables eliminates the need for creating them within a program. Also, APL permits many statements to be combined into a single line of code. Since fewer lines are coded, programming time decreases. More important, the modularity of APL encourages the generation of many small interdependent programs. As a result debugging time is decreased by approximately 50%. Since half of all programming time is devoted to debugging, the time saved is significant.


        If SQL is such a poor language, why does it continue to be used ? Wouldn't Darwinism have long ago found a replacement to overtake it ? (I'm aware there are contenders, but they haven't seemed to gain much of a foothold).

        The same reason that it took 70 years, (and in this case legislation changes), before telecoms became competitive beyond one or a few, (depending upon the country), big players in each market place. When I asked for my first phone to be installed nearly 30 years ago, it took 6-12 weeks unless you could substantiate yourself as a doctor or similar. The last time I had one installed, they offered me installation in 2 days time, and the engineer turned up the following day "on the off chance". He'd had a cancellation in the area.

        In a word: infrastructure.

        SQL implementations evolve around the proprietary infrastructure of the RDBMS' that they come with. The infrastructure that supports the 5 or 6 major players in the RDBMS marketplace has taken those 30 years to evolve to where they are now.

        Try using your Oracle bulk-loading utility with a DB2 database. It ain't gonna happen. Equally, any alternative to SQL will require an efficient and powerful RDBMS to underly it before it is going to go anywhere.

        The progress in the OSS RDBMSs is rapidly gaining ground, with more and better implemented feature sets and more thoroughly proven engines. The time is now ripe for alternative query languages to be layered on top of those engines. It isn't going to happen over night, but it will happen.

        Will the old guard role over and die when it does? I doubt it, but they will finally have to start taking notice of the calls for something new, different or better. They already have the expertise in house, and many of them have already produced various proprietary extensions to their offerings; many of which pay little more than lip service to the structure and standards of SQL. Of course, they make sure that they are "compatible" with (their implementations of), SQL.

        DB2 has gone through some revamps lately, and IBM still have a considerable amount of latent expertise in APL. Maybe, the APL-as-a-query-language idea isn't such a stupid one after all :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: (OT) Why SQL Sucks (with a little Perl to fix it)
by Ovid (Cardinal) on Dec 11, 2005 at 03:24 UTC

    The only reason I used SQLite is that it's a quick-n-dirty way of showing a concept. The problems I show are not related to SQLite, they tend to be database agnostic. SQL is bag-oriented, not set-oriented. As a result, it's very easy to get incorrect results with it. Those are details which we should be hesitant to expose in language.

    As for the problems with NULLs, I've if you read the link I posted, I show simple examples of how NULLs in a database easily allow queries to return incorrect answers. I've also worked on extremely large databases with huge queries and NULLs which cause spurious results are notoriously difficult to track down. Unfortunately, some of the techniques to limit the impact of NULLs -- using 6NF and distributed foreign keys -- are either serious performance bottlenecks due to current DBMS implementations or are not natively supported in SQL.

    Of course, similar criticisms apply to Perl and yes, I've made them. Perl has some weaknesses too and I don't mind admitting that. However, Perl is trying to eliminate many of them with Perl6 (whether it will be successful remains to be seen). I don't see many database vendors going back to the underlying relational theory to fix these issues. I see them hacking things on top of SQL, bragging about their object support or other things which, it seems to me, would largely be moot if SQL was "fixed" (which I doubt it ever will be).

    Sure, we can point to programmers who have a poor understanding of relational or set theory and say the problem lies with them (often it does). But to force them to use a language which doesn't support these concepts very well isn't going to make the situation better.

    Now if you can provide concrete examples of why Date is wrong, I'd happily hear counter-arguments.

    Cheers,
    Ovid

    New address of my CGI Course.

      I'll take a shot at that challenge, although really Date is right. It's more a matter of the semantics of right and wrong here.

      DISTINCT issues: First off, your examples make use of silent joins and unnormalized design, which makes everything look worse than it is. Normalization + Explicit Joins eliminate the problem or make it very obvious why it happens. It happens every time you select rows of one table, then jump across an many-to-{many|one} relationship to another table and pull out column data from there. A person versed in SQL will know to always use DISTINCT in this case, and that it is unneccesary in any other case assuming a normalized design. It is arguable that the semantics of SELECT should be that it automatically DISTINCTs the results in these cases, as the case is easily machine-detectable. But I don't think that makes all of SQL broken. It's just a repetitive idiom SQL users have to learn, like most languages have.

      NULLs: Quite simply, for most designs, most columns should be set to "NOT NULL". End of problem. Enabling NULL values in a column without thinking through the implications (which are indeed precarious) will of course cause issues down the line. The same can be said of many ill-thought-out design-time decisions.

Re^2: (OT) Why SQL Sucks (with a little Perl to fix it)
by fergal (Chaplain) on Dec 11, 2005 at 19:50 UTC
    Most complaints I hear regarding SQL can usually be classified in one of the following categories:
    • The complainer lacks a full understanding or appreciation of set and relational theory
    • The complaint is made in relation to a weak SQL implementation
    That may or may not be true but Date is one of the authors of the SQL standard and one of the leading authors of RDBMS books, so I think we can assume that his criticisms are not those of a "crank".

    One thing that does annoy me about Darwen and Date's criticisms is that they are a bit too fond of making statements about facts they have already established and just refering the reader to the paper/article/expensive-book-that-they-just-happen-to-have-written in which they were established. They are struggling against a huge mainstream of standard SQL/RDBMS dogma they need to make their arguments easily accessible and not put the burden on the reader - that's fine for academia with it's well-stocked libraries but a bit much to expect of the casual reader who has not yet drunk their kool-aid.

      They are struggling against a huge mainstream of standard SQL/RDBMS dogma they need to make their arguments easily accessible and not put the burden on the reader - that's fine for academia with it's well-stocked libraries but a bit much to expect of the casual reader who has not yet drunk their kool-aid.

      FWIW, that's just what Database in Depth is designed to do. O'Reilly has posted a link to the first chapter, as well as an interview with Chris Date.

      —Theory

        Cool, my employer will be ordering a copy ASAP :)

        I just started reading the interview and already he has a list of criticisms of the relational model which he refutes without explanation. I presume he doesn't like repeating himself but repeating himself is exactly what he should be doing if he wants people to understand his arguments.