in reply to Re: Should it be database independent?
in thread Should it be database independent?

You're not the only one who deals with huge tables. I haven't done the billion+ in a given table, but I have done queries against million+ JOIN million+ JOIN million+ for reports that had to return in under 5 seconds, including rendering to the user.

If you're stuck with MySQL, Access, or some other limited SQL vernacular, . . . doing significant things with the data it returns (e.g., data mining, KD, etc.)

That's funny ... I was doing data mining with MySQL. (That's those 1E18-row queries I was talking about before.) Access is limited, but it was never meant to be a RDBMS - that's what SQL*Server is for. MySQL may have been limited in the past, but it is a fully-functional RDBMS that supports ACID transactions and complete failover.

. . . a more robust DBMS that can support big chunks of that processing with its SQL syntax.

I'm curious - what's Oracle's version of LIMIT again? What about Oracle's restrictions on full-text searches? Or, how about Oracle's inability to turn ACID off when I, the developer, know I will never want it? Or, was it the overly-complicated clustering technology that Oracle itself agrees has a single failure point? Please elaborate ...

FWIW: DBI has been mentioned numerous times here; you may find SQL::Preproc' a simpler/nimbler solution.

It's also a source filter. Source filters have a few drawbacks:

Personally, I drink the "Separation-Of-Concerns" KoolAid™ and keep my Perl and SQL separate. This means using either Class::DBI (if I want an OO abstraction) or DBI (if I want performance).

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

  • Comment on Re^2: Should it be database independent?

Replies are listed 'Best First'.
Re^3: Should it be database independent?
by kimanaw (Beadle) on Mar 22, 2005 at 19:13 UTC
    You're not the only one who deals with huge tables. I haven't done the billion+ in a given table, but I have done queries against million+ JOIN million+ JOIN million+ for reports that had to return in under 5 seconds, including rendering to the user.

    Unless those were outer joins, the number is likely much less than 1E18 (I'm assuming MySQL's optimizer is smart enough to minimize join processing). Did they include any range joins on non-index columns ?

    That's funny ... I was doing data mining with MySQL. (That's those 1E18-row queries I was talking about before.) Access is limited, but it was never meant to be a RDBMS - that's what SQL*Server is for. MySQL may have been limited in the past, but it is a fully-functional RDBMS that supports ACID transactions and complete failover.

    I'll take your word for it, tho my review of their SQL syntax leads me to believe they still have some way to go. (I've obviously stepped into the MySQL vs. the World hornet's nest, I should know by now not to bring it up..)

    I'm curious - what's Oracle's version of LIMIT again? What about Oracle's restrictions on full-text searches? Or, how about Oracle's inability to turn ACID off when I, the developer, know I will never want it? Or, was it the overly-complicated clustering technology that Oracle itself agrees has a single failure point? Please elaborate ...

    Hmm, since I never mentioned Oracle in my post, I'm not certain what question I'm being asked to elaborate on?

    It's also a source filter. Source filters have a few drawbacks:

    * Slower than a non-filter - they have their own parsing section which is run in Perl

    A relative factor. If the application is just running "SELECT NOW()", you're quite right. If however, you're doing a large scale complex query, or multiple queries, the net impact of the source filtering is likely much less than the actual database processing time. And, of course, if run under a persistent environment, the source filtering happens during the load, not runtime. Since SQL::Preproc just emits DBI calls, runtime performance should be comparable to a DBI app.

    * Can make mistakes - Only perl can parse Perl, which includes non-Perl stuff

    Hmmm, last time I checked, perl can make mistakes too (thats part of the reason the release numbers have that 3rd digit). For that matter, I've heard that perl programmers occasionally make mistakes. So, based on your logic, I guess we all should stop programming in perl...or perhaps altogether.

    * May not work in a persistent environment correctly - this includes mod_perl

    Can't answer that one, except to say I don't see anything in SQL::Preproc that persists anything the application can't override.

    SQL::Preproc can make some things quicker/simpler to implement. Its intended as a "simple things should be easy, and hard things should be possible" solution for developers that are comfortable with SQL.

    Personally, I drink the "Separation-Of-Concerns" KoolAid™ and keep my Perl and SQL separate. This means using either Class::DBI (if I want an OO abstraction) or DBI (if I want performance).

    A matter of preferences, then. Some folks prefer a little mustard on their hot dogs, and some of us prefer a little hot dog with our musard. I guess the point I was originally trying to make was that I've seen many instances where separation/encapsulation led to throwing an object wrapper around a cursor on a large table, and processing every returned row on the client, in perl (or Java or C++, or whatever), and ignoring the capability of the underlying DBMS and/or its SQL syntax (MySQL included!). Once those little wrappers get written - and esp. if they're written to a least common denominator SQL -, they tend to take on a life of their own, and opportunities for exploiting powerful DBMS features get ignored or shut out.

      Unless those were outer joins, the number is likely much less than 1E18 . . .

      Regardless of how the query was handled, the fact is that it involved three tables, each with a million+. RDBMSes that can handle huge single tables can sometimes choke on queries involving joins of many smallers tables.

      Did they include any range joins on non-index columns ?

      No, because I could avoid them based on application requirements. I actually avoided every single non-index join I could. I did have to do non-index range checks on date fields, but that's par for the course.

      Can make mistakes - Only perl can parse Perl, which includes non-Perl stuff

      You're missing the point - source filters can break on legal syntax, leading to very difficult-to-diagnose errors. Filtering Perl syntax is harder than most, but even the simplistic SQL syntax can be hard at times.

      Hmm, since I never mentioned Oracle in my post, I'm not certain what question I'm being asked to elaborate on?

      DB2, Sybase, whatever. The point is that each RDBMS has its strengths and its weaknesses. I will be the first to point out that MySQL will not have production-ready triggers or updateable views for at least 6 months to a year. And, even then, I wouldn't use it. Yet, MySQL is enterprise-ready and is faster under many circumstances than every other RDBMS (commercial or OSS) out there, including Oracle, Sybase, and DB2. Those circumstances can include very complex queries against tables with millions of rows in them.

      Can't answer that one, except to say I don't see anything in SQL::Preproc that persists anything the application can't override.

      That's not the problem. Some source filters use INIT and CHECK blocks to do their work. Others, like Filter::Simple (which is what SQL::Preproc uses) don't. BEGIN, INIT, and CHECK blocks don't always behave as expected in persistent environments.

      I've seen many instances where separation/encapsulation led to throwing an object wrapper around a cursor on a large table, . . .

      q.v. OO concepts and relational databases for my thoughts on that.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re^3: Should it be database independent?
by jplindstrom (Monsignor) on Mar 22, 2005 at 22:03 UTC
    I'm curious - what's Oracle's version of LIMIT again?

    Tom knows

    /J

      It was a trick question. I know the rownum trick, but it's not the same.

      On further checking, rownum doesn't respect order by and you have to use a nested select if you want the top N of an ordered set.

      Taken from http://c2.com/cgi/wiki?ThinkSqlAsConstraintSatisfaction, about halfway down the page. Further more (and I haven't looked for the reference), limit works with nested queries correctly where rownum doesn't always work. It's a neat almost-the-same, but it's not.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        I'm not really interested in arguing for or against Oracle, but I'm interested in how you think the Oracle version differs semantically from what you're used to in MySQL. "getting rows N through M of a result set" sounds a lot like the LIMIT clause regardless of how clumsy the actual syntax is. So I guess I just don't get your point.

        Personally I think syntactic sugar matters a lot, so I wouldn't mind if they extended their SQL dialect to support this since it's obviously something a lot of people have a need for.

        (The issue about rowid and "order by" etc is a known issue but it's not a problem with the solution presented at Ask Tom.)

        /J