marvell has asked for the wisdom of the Perl Monks concerning the following question:

A recent discussion on the subject of SQL with Zigster and other friends has let me to this question:

With all the sub selecting, maxing, having and grouping and other such SQLisms associated with quite complex queries, wouldn't it just be easier to select the lot into perl (row by row or otherwise) and do the processing there? Assuming, that is, (or maybe not) that the program receiving the data was perl anyway.

As there will be many factors, here are some points raised:

Would it be:

I have not benchmarked this, but my hunch says all of the above.

How would it fair if there was one or many tables involved?

Would it depend on the database?

In some cases, I imagine it would be vlaid to completely remove the database and replace it with flat files.

Another hunch of mine say that it will simpy come down to "it depends what you are doing". But that would be a shame.

Your thoughts would be appreciated.

--
Brother Marvell

Replies are listed 'Best First'.
Re: Replacing SQL with perl
by AgentM (Curate) on Dec 06, 2000 at 22:07 UTC
    Woah! Hold on there! The real answer is: none of the above! Faster? DBengines are specifically designed to optimize lookups, changes, and other operations. They store data in heavily optimized hash-like tables which beat the socks off flat files. Easier? Personal preference i guess, but isn't writing one SQL statement with all of its helpful stuff like placeholders (DBI) and *s easier than looping through hashes or arrays? Neater? connect, grab, close, use data- and you're ready to fly!

    ESPECIALLY for the complex queries are you more interested in using built-in DB optimizations like functions, cacheing, locking, and the such- why would you NOT want to use it? Storing info in your cgi can even be dangerous since it would never know when the data would become outdated. In short- USE AN SQL SERVER if you WANT speed, fast devel time, and neatness- you just can't lose!

    What kind of situations do you imagine that would benefit from dropping the db? I would imagine only insanley simple things like an included file to print to the user but that not may even benefit. AND, with the DB, you get a great way to organize and even PROTECT information from prying eyes. A well-organized table beats recursive searches through the filesystem anyday! The only time you want to do any processing on the data after you have it is when you may have data that you need to compare it to or its something that the db can't handle or doesn't support or you're worried about DBengine load (speed) if a complex operation is taking too much time.

    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      I didn't mean replace the database with file for all applications, just some. I've worked on projects with over powered, over indexed and over network databases. In those situations, the overheads have been huge. The dataset was small and it just didn't need it.

      Again, I'm not suggesting ditching databases, as a rule, just ditching complex queries.

      Obviously, write queries will have a very special arrangement. As you say, all the locking and stuff would be impossible. I suppose my question would be more aimed at read only queries.

      --
      Brother Marvell

        Even so, I'm in favor of keeping my data together, even across a network. A small optimization by hard-coding in data is hardly worth spreading it around to 30 different scripts that will need it. Also, keeping data centralized allows for maximum expandability. If all of the REAL variables in your program are stored in the db, then you have no problem changing it, if, let's say, to new entries must be accomodated for.

        But, if you have 100% static data, then, by all means, put it whereever you want (though I would still like to keep the data centralized). This occurs so rarely, that this may not even be useful. Using perl on flat-files (especially CGIs on the same file) are dangerous since race conditions may occur and corrupt data. In this case, you may need to use a non-perl mutex server (like the one I'm writing now). So, 100% static, read-only data it is, then! Well, you might as well make it a constant var, then, eh?

        AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
        Actually, complex queries are the ones NOT to ditch. In fact, the answer is still no to all your questions.

        If you have a case where you think I'm wrong, I'd bet that a local cache would work better than sprucing up your perl with DB-like code. Let the DB do its job. Lots of people worked real hard to make it fast, efficient, and simple.

        --
        $you = new YOU;
        honk() if $you->love(perl)

      Well, this may be a case where you'd want to use the right tool for the job.

      You can use sql to do string manipulation (trim, substring, soundex, and so forth), but the sql server has to process those in addition to any joins, restriction, ordering, and so on.

      I have found that passing a sql query using DBI and then using perl to do any text processing is far more efficient in terms of development and speed of execution than trying to do it all in sql. Especially if you are trying to create a dynamic web page with up-to-date information from a relational database.

      Bottom line: SQL is the tool for querying relational databases; Perl is the tool for formatting the results. They work synergistically, and complement each other's strengths and limitations.

      -----
      "Computeri non cogitant, ergo non sunt"

Re: Replacing SQL with perl
by clemburg (Curate) on Dec 06, 2000 at 22:10 UTC

    For small data volumes - OK. Great. For mass data, where will you store all the stuff you "select into perl", e.g., to group it: in memory?

    That said, for a lot of situations it *is* a good thing to do some of the processing in Perl, if only to stay database vendor independent.

    (Updated to use the right id link syntax): See also Storing Complex Structures in a database.

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: Replacing SQL with perl
by PsychoSpunk (Hermit) on Dec 06, 2000 at 22:49 UTC
    Your hunch about "depends" seems right. To me it's really a matter of optimization. You don't want to
    SELECT * FROM table
    if you're only going to use data that would have been better limited by adding a WHERE clause in your statement. Furthermore, SQL is better at handling data in databases, so beyond the situation with subselects, SQL is going to perform better in those other situations unless your Perl code does some wicked tricks, in which case, will you share?

    Notice that I left out subselects as a special case. This in my opinion, is the one type of query mentioned that I think could be handled capably by both Perl and SQL. Of course, it holds true to the "depends" hunch, in that Perl would only be more capable if the single SELECT queries were significantly less complex than the case of using a sub-select statement.

    Parsing-wise and other operations with text, Perl will win hands down, but looking at scale, the larger the returned dataset from an SQL query, the more load you throw at Perl. So, then you're looking at hardware issues.

    Perl can win, but only if your Perl code is optimized and your SQL isn't. Disclaimer: I haven't run any tests to see if I am right, but the only fair test would be to use identical machines (or the same machine for both the perl and the DBMS) in order to fairly test based on code. But after working with the DBI and numerous databases for a year, I just go with what will provide an adequate response from the application. My reply shows what I tend to do in my own code, and in the past, others familiar with the DBI have reviewed my code favorably.

    I'll finish by saying that short of trying every variation, you may not find a decent answer. There are optimization tricks in perl, the DBI, and SQL that will all be handy, and as such, it's not an easy question.

    ALL HAIL BRAK!!!

Re: Replacing SQL with perl
by BigJoe (Curate) on Dec 06, 2000 at 22:12 UTC
    Currently what I do on Oracle 8i is I create the stuff like joins, complex updates and some others as DataBase Proceedures. Then I can call them inside my SQL code. On out putting this data I try to let my SQL handle ordering and such (because the database is faster at it, let's face it: that is its job.) Plus our DB machines are twice as everything of our Web servers.

    I believe using these proceedures answers yes to all your questions. This is even more true when it is run many times.

    --BigJoe

    Learn patience, you must.
    Young PerlMonk, craves Not these things.
    Use the source Luke.
      whoa there! ORDER BY makes almost any Database, especially Oracle, crawl. And if you have multiple app servers, you might want to do your sorting outside the database. depends of course, but ORDER BY is a killer.
        Not if the indexes are set up correctly. Anything you ORDER BY on often enough to optimize, optimize with an index.

        --
        $you = new YOU;
        honk() if $you->love(perl)

Re: Replacing SQL with perl
by marius (Hermit) on Dec 06, 2000 at 22:40 UTC
    Yipes. It's an intriquing concept and if your knowledge of regex and perl is far superior to your knowledge of SQL, then yes, you have a valid point. However, SQL can do a number of things easily that perl can't (well, CPAN-less perl can't, at least) like modify date/time structs without complex accountability for time & date rollovers.

    Another thing you'd want to keep in mind is what size of database are you using? If you've got a database with <1000 rows this might be feasable. But your memory usage is going to increase quite a bit as your database size goes up as well, and when you get into large databases, well, let's just say that loading everything into physical RAM doesn't scale well, no matter how effecient the OS/interpreter/whathaveyou is.

    Since you mentioned subselects though, Perl could come into play there.. Since some db's (*cough*mysql*cough*) don't support subselects you could use perl to generate your SQL statement. I can't begin to imagine the error checking that would go into this, however.

    My $0.02 (and longest post to PM yet! (scary, no?)

    -marius
Re: Replacing SQL with perl
by turnstep (Parson) on Dec 06, 2000 at 23:08 UTC
    It really does come down to "it depends on what you are doing." You really have to know your database and your perl very well. In general, I leave most things to the database if possible, and use the SQL to return the smallest, most refined result set back to the script as possible. One thing that I almost always do with perl however, is sorting... SQL cannot touch some of perl's beauty in sorting:
    my $unsorted = $dbh->selectall_arrayref($FOOSQL) or die "Fooey, foo failed ($FOOSQL) $DBI::errstr\n"; for ( sort { $Q::SortBy == 1 ? ($a->[1] cmp $b->[1] or $a->[4] cmp $b->[4] or $a->[2] cmp $b->[2] +) : $Q::SortBy == 2 ? ($a->[1] cmp $b->[1] or $a->[2] cmp $b->[2]) : ($b->[3] <=> $a->[3] or $a->[2] cmp $b->[2]) } map {[$_, lc $_->[5], substr($_->[4],3,3), int $_->[1], lc $_->[3]]} @ +$unsorted) { ## Do something with $_ here... }
Re: Replacing SQL with perl
by lhoward (Vicar) on Dec 06, 2000 at 23:43 UTC
    In some instances you can write much quicker (execution time) code by doing some of the DB work in perl rather than doing it in the DB. The downside of doing it in your code is that your code takes longer to write and is harder to maintain.

    For instance, I have on numerous occasions loaded an entire table into a hash and refrenced data in it directly instead of doing a join in the SQL query. This in some cases has helped my program run as much as 20 times faster. This is only appropriate in some situations, such as when your program doesn't have to take into account any changes that might occur in the table in the DB after it was loaded.
    I firmly believe that all the data should be stored inside a RDMBS, just that sometimes you can achieve a big performance boost by doing "more work in perl, and less work in SQL".

Re: Replacing SQL with perl
by boo_radley (Parson) on Dec 06, 2000 at 23:28 UTC
    *ahem* *ahem*...

    "running with scissors! running with scissors!" :)(g,d,r)

    I would be especially nervous trying to do all the behind the scenes stuff (referential integrity, type enforcing,etc.) let alone trying to optimize ways to store and look up columns.

    There might be benefits to simple databases of small sizes, but for hefty applications, I'll stick with the Sybases, Oracles, and DB2s.

Re: Replacing SQL with perl
by jolhoeft (Sexton) on Dec 07, 2000 at 00:03 UTC
    In my experience, a lot depends on the size of your dataset. For a few hundred rows, unless you have very complex queries, a flat file is probably best. With complex queries it is much more a case of it depends. Depending on the table structure and the nature of the queries some complex things will be simpler in SQL, others in Perl. As a crude rule of thumb, if you must use a regex to search, use Perl.

    For some applications I'm looking at, with 10,000 to 20,000, the time involved in reading the flat file and parsing it is significant. The well structured, binary format of db files makes reading them quicker, in a sense they are preparsed.

    And finally, to repeat something someone mentioned earlier, it will be vastly easier to handle multiple scripts making changes at one time with a db then without.

Adding perl to SQL
by poet (Initiate) on Dec 07, 2000 at 13:36 UTC
    I've been dreaming about having (embedded) perl in my query language. Wouldn't it be nice if the db-engine supported perl regular expressions and functions?

    Or to have stored procedures in perl? For queries SQL is good, but for whiles, ifs and declares, perl's a lot better!

    Maybe this doesn't add anything you can't do by calling SQL from perl, but if you have to call the DB from, say, java. And for the next generation of query languages, I'm sure that perl would have some features to lend.

    What do you think, my brothers?

    Yours,
    Poet (first-time poster ;-)

Re: Replacing SQL with perl
by wardk (Deacon) on Dec 07, 2000 at 01:25 UTC

    Implenting essential functionality like indexes, foreign keys, triggers and other niceties would be perhaps fun to work, but it would be quite the effort to replace the value-added that comes with a professional RDMS.

    if the data involved is small, and would equate to a couple of isolated (no relationships) tables, then perhaps a flatfile scenario would be ok. but if you want to restrict columns to domains of data that would reside elsewhere (another table, file) then not leveraging what others have spent decades building would seem to be a bit illogical.

    Even lightweight databases with minimal advanced capabilities (i.e no stored procedures, triggers, replication, recovery) in most cases would still be advantages (IMHO) to using flat files. Today it's only one or two tables/files, but when does any meaningful project ever stay static?

    but I agree with your friends....it would very well "depend"

    I think this is a great topic. thanks for posting it...

Re: Replacing SQL with perl
by runrig (Abbot) on Dec 07, 2000 at 06:16 UTC
    On one script that used to take over two days to run, I got it running in under two hours by among other things, reading in some of the smaller static tables (< 50,000 rows - hey, I had the memory!) into hash arrays, then using the hash arrays instead of repeatedly querying the database. This was strictly to look things up by primary key, it was not to search/scan fields using wildcards or to let perl do any ordering. So sometimes it is worthwhile to read entire tables into memory depending on what you're trying to accomplish.
Re: Replacing SQL with perl
by belg4mit (Prior) on Dec 07, 2000 at 02:59 UTC
    I would have to agree with "none of the above" unless by easier you mean "more intuitive", in which case well that's up to the end user. But possibly. On a related note, why not try playing with Sprite.pm which is a subset of SQL in pure Perl for handling user-defined flat-file formats. It's not blazing, so you can nix that point. But it's not terribly slow either, especially for reasonably small to mid size applications.
Re: Replacing SQL with perl
by Maclir (Curate) on Dec 07, 2000 at 05:57 UTC
    Well, lets us assume you are working with a "serious" RDBMS (by that I mean Oracle, Informix or DB/2 - those where the engine is the result of substantial development work over many years). If performance is key, do as much in the engine as possible. That means, all that good SQL stuff, sorted procedures, triggers and so on.

    A good dba will look at the application requirements, and make sure there are the correct indexes across the whole database. Database optimisers (like those used in the above mentioned databases) are normally very good at squeezing the last bit of performance out of the engine and the hardware. After all, that to a large degree is what you are paying for - the top end performance.