Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications

by princepawn (Parson)
on Jan 16, 2002 at 17:38 UTC ( [id://139198]=perlmeditation: print w/replies, xml ) Need Help??

I am strongly against runtime SQL generation. It is my opinion that the mapping between a collection of program conditions and the resultant SQL queries represents a finite and describable space.

one form of dynamic sql generation: if-then-if-then-if-then ad infinitum

In other words, I want to remove things like this
use CGI; use DBI; my $cgi = ...; my $dbh = ...; my $sql = "SELECT "; push @field, 'age' if (my $age = $cgi->param('age')); push @field, 'gender' if (my $age = $cgi->param('gender)); if ($cgi->param('weight_constraint')) { push @where, sprintf 'weight %s', $cgi->param('weight_constaint'); } my $sql = sprintf "SELECT %s FROM user WHERE %s;", join ',' @field, join ' and ', @where; $dbh->selectall_arrayref($sql);
and also...

Modules supporting dynamic sql generation

I also don't think that a more structured approach to dynamic SQL generation such as is represented in DBIx::Recordset, DBIx::Abstract, DBIx::SQLEngine, DBIx::SearchBuilder, et al makes things any better because there are...

Issues with runtime SQL generation

  1. time. the time consumed breaks down into these types
    1. to execute the conditionals. This is the same for both.
    2. to build data structures which then get joined, sprintf'ed and otherwise folded into a final string
    3. if the module is sophisticated, it will also spending time caching the resulting query to avoid creating it again. It will also flexibly cache it either to disk or memory using something like Cache::Cache. The advantage of the static approach (exemplified by SQL::Catalog) is that the caching happens to either place before the application starts
  2. tuning it is possibly advisable to have a database expert create well-tuned static (static meaning, the only variability is in placeholders) queries as opposed to more general on-the-fly general queries.
  3. task divisionagain, allowing the database expert to code up the queries satisfying a specification on his own while the Perl programmer accesses then purely by label, as exemplified in SQL::Catalog provides the independance that large-scale applications need.

The only thing lacking from the static approach

Is some sort of structure which maps a collection of cases to a collection of SQL labels. I am thinking about Switch but also about decision trees.

Can anyone provide any pointers to a good means of such a mapping? Thanks.

Replies are listed 'Best First'.
Re: Eliminating Dynamic SQL...
by footpad (Abbot) on Jan 16, 2002 at 20:58 UTC

    Is [there] some sort of structure which maps a collection of cases to a collection of SQL labels?

    I think you have a very flexible, structured tool available, one that lets you create the exact results you're looking for. Specifically, your database.

    Determine an initial set of queries and their most common variations. Create a set of tables that express these variations well, and then use queries, views, stored procs, or whatever you want to provide a run time interface that lets the user:

    • Choose the query they're looking for from the list of queries in your tables.

    • Select common variations (e.g. different values for the ORDER BY clause) from a subset related to the main query they chose.

    • Enter placeholder values as needed.

    As you can probably tell, this requires multiple tables. However, they will likely be very tiny tables, so there should be few performance concerns.

    Maintenance should be simplified because it then becomes a data editing task, rather than a source modification. If your DBA creates a view to replace a complex WHERE clause, then you should be able to incorporate using a simple UPDATE query. If your end-user's new query turns out to be an alternate ORDER BY clause, you INSERT it into the appropriate detail table. If you model this properly, your Perl code should be very simple and basic.

    In short, consider treating this as a data problem, not a language/module problem.

    --f

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by lachoy (Parson) on Jan 16, 2002 at 20:06 UTC

    This is an interesting post. I agree with the second and third issues you bring up (tuning and task division), although I think that in many applications the third issue isn't sizable since it's the same developer with a different hat on :-) But you're focusing on large-scale applications so I understand.

    Regarding the first issue: I'm curious if you've measured the time needed to build the queries using the methods you describe here, or others more complicated.

    I like the general concept behind generating SQL statements beforehand, particularly for queries that rely on multiple tables, because it's easier to debug and to test. However, this seems to conflict with the desire of many users to perform ad-hoc queries, even if constrained by a simple web form. In cases like this, do you think it's a good idea to generate all the queries beforehand? This would seem to be a query-management nightmare since you have to account for every combination of criteria.

    This might lead to a hybrid system where often-used queries are generated beforehand but ad-hoc queries are generated (partly at least) at run/request-time. I'm thinking of including something like this in SPOPS as it would add a new wrinkle to managing object relationships.

    Chris
    M-x auto-bs-mode

      ...generating SQL statements beforehand... seems to conflict with the desire of many users to perform ad-hoc queries, even if constrained by a simple web form.

      It all depends on the size of the database, the type of queries, and the load that is placed on the database (i.e. the number of parallel instances of such ad-hoc queries).

      Personally I've worked on several large systems (for example the now defunct eCircles.com site) where all access to the database is via pre-defined stored procedures (which is equivalent to pre-defined queries, really). Obviously it requires some thought to define the set of operations that users should be allowed to perform, but it has the huge advantage of separating the SQL from the application. If a query performs badly it can be tuned by the DBA (or the SQL developper) without touching the perl code (as long as the input and output stays the same, of course).
      The queries can also be analyzed in isolation to ensure decent performance.

      It is, IMHO, a must.

      Michael

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by talexb (Chancellor) on Jan 16, 2002 at 19:23 UTC
    I am not cure I have understood all of your meditation, but let's see if I can address the part titled Issues with runtime SQL generation .

    In your example you show a select statement that grabs certain fields (@field) from a table if the CGI variable has been passed in, based on certain conditions (@where).

    I suppose an attempt to optimize would have a BEGIN statement generate all of the possible permutations of @fields and @where. The load time (i.e. time to execute the BEGIN) and the memory required to store the results could be considerable, based on the size of each of the arrays.

    About the only database tuning that I can think of for a select statement would be indexing on the contrained variables; in this example, weight would be an indexed column.

    Caching queries might be possible and even quite efficient, but I don't know if that type of thing can easily be generalized. In this example of a simple select statement, it's likely easier to just generate the SQL and grab the data than to search through a collection of cached queries, based on which fields and constraints have been requested.

    Is there a more complicated example to explore the issues that you're interested in pursuing?

    --t. alex

    "Of course, you realize that this means war." -- Bugs Bunny.

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by poqui (Deacon) on Jan 16, 2002 at 22:25 UTC
    I agree with footpad, that this is a data problem.
    In Oracle, and especially for Data Warehousing, there is the concept of "Materialized Views" which is an attempt to solve this problem, partway between coding actual tables to hold queries and still allowing the users to enter any query they please. The (potentially) very long execution times for certain queries makes this a valuable approach, especially when the underlying data doesnt change very quickly (materialized views can be refreshed, but in my experience, the time cost is => original query).
    Certain front end tools (like Brio) do have varying user levels as well, so that less experienced users have a set of canned queries to choose from (and these are stored in the Oracle database behind it), and the more experienced may enter their own queries.
    This approach gives the kind of control a DBA craves, while still allowing a modest level of freedom for users.
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by perrin (Chancellor) on Jan 16, 2002 at 23:37 UTC
    If you use prepare_cached(), you can avoid having to prepare the generated queries more than once or build DBI query objects for them. That's the only expensive part of generating SQL. As for the tuning and task division, I just try to minimize the variations in the generated SQL so that it can be optimized by a DBA once and then left alone. If necessary, you can dump out all the variations once and let your DBA test them all.

    If you were to literally put every variation of SQL in your program, you would have a maintenance nightmare with tons of duplication in the SQL code.

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by demerphq (Chancellor) on Jan 16, 2002 at 20:15 UTC
    Hmm. Regarding your point 1. Id be pretty suprised if the issues in this point are non-trivial. In fact i'd be pretty suprised if 1.3 wasnt considerably slower than 1.2 for 99% of applications... (think file IO versus in memory string manipulation).

    2 and 3 however are valid points that I will be putting some thought into with regards to my current project. Thanks.

    Yves / DeMerphq
    --
    When to use Prototypes?

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by ignatz (Vicar) on Jan 17, 2002 at 02:42 UTC
    Very interesting meditation.
    Is a satisfactory answer really possible? It often feels to me like relational database design and SQL are being pushed to the breaking point within applications.
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by simonm (Vicar) on Mar 28, 2002 at 07:00 UTC
    Saying you're strongly against runtime SQL generation strikes me as similar to saying that you're against procedural programming... It leaves out the context of the problem that you're solving.

    If you're building a script with a known number of queries, a static collection of queries may adequately address your needs, and if your queries are fairly complicated or high-volume, I can understand putting in the effort to optimize them.

    However, if you're building a more open-ended system with various ad-hoc query capabilities, such as an OO-RDBMS mapper, you might find that pre-writing all of the queries one might possibly use is laborious, and properly selecting amongst them (the "only thing lacking", above) becomes an increasingly complex problem.

    There must actually be a whole range of possibilities here, ranging from static, to mostly static with some dynamic subclauses, to mostly dynamic with a few static hand-optimized queries, to completely dynamic, and all sorts of other variations. Each of those techniques will be good at solving some kinds of problem, and less useful for others... And of course, the quality of the individual library's implementation, and the developer's level of comfort with the approach, are factors that are at least as important as whether the SQL is written in advance or not.

    Without a discussion of the context -- what you're doing and why -- declarations of jihad against dynamic SQL (or against dynamic binding, or against Microsoft, or whatever) have a ring of hollow ideology to my ear.

Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by trs80 (Priest) on Jan 18, 2002 at 10:39 UTC
    I think the dynamic creation of SQL is used if flexibility
    is required. If the queries are always the same it would be
    enough to store them in a list or hash and then access them
    I worked on a concept once where all the SQL would be stored
    in a file like so:
    %sql_queries = ( 1 => qq~SELECT blah FROM blah WHERE id = ?~, 2 => qq~SELECT blah,more_blah FROM blah WHERE id = ? and name = ?~, );
    Then for development the programmers would refer to the
    hash and then access the right SQL them via the correct key.
    Now this was just an idea that was tossed around and no real
    project was developed with it, but the idea brings to point
    the rules you talk about in order to "choose" the right query.
    I would be interested in seeing your idea of how a "rules" file
    would be structured for something like this.
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by seanbo (Chaplain) on Jan 18, 2002 at 00:08 UTC
    It's interesting that you bring this up. Over the past few weeks, I have been working on a web form that will allow my users to select some parms from a webpage and genereate a set of graphs based on their selections.

    Aside from needing to create some indexes to speead the inital load of the page and selection boxes, I previously, hadn't had any concerns about coding in the manner I have. Now, I am concerned that I should rethink my position.

    Just to give some background. I have some input in the initial db design, but it needs to be setup in a manner where there is really no maintenance by the users. Once I turn it over to them, it's theirs. I never ever see it again. It will be fed by a .csv file once per month.

    How would some of you propose I alter this code?

    The resultant webpage is here.

    As a side note...I am currently devloping it using MySQL, but the backend db will become MS Access.

    perl -e 'print reverse qw/o b n a e s/;'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://139198]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-28 16:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found