I am strongly against runtime SQL generation. It is my opinion that
the mapping between a collection of program conditions and the
resultant SQL queries represents a finite and describable space.
one form of dynamic sql generation: if-then-if-then-if-then ad
infinitum
In other words, I want to remove things like this
use CGI; use DBI; my $cgi = ...; my $dbh = ...;
my $sql = "SELECT ";
push @field, 'age' if (my $age = $cgi->param('age'));
push @field, 'gender' if (my $age = $cgi->param('gender));
if ($cgi->param('weight_constraint')) {
push @where, sprintf 'weight %s', $cgi->param('weight_constaint');
}
my $sql = sprintf "SELECT %s FROM user WHERE %s;",
join ',' @field, join ' and ', @where;
$dbh->selectall_arrayref($sql);
and also...
Modules supporting dynamic sql generation
I also don't think that a more structured approach to dynamic SQL
generation such as is represented in DBIx::Recordset,
DBIx::Abstract, DBIx::SQLEngine,
DBIx::SearchBuilder, et al makes
things any better because there are...
Issues with runtime SQL generation
- time. the time consumed breaks down into these types
- to execute the conditionals. This is the same for both.
- to build data structures which then get
joined, sprintf'ed and otherwise folded into
a final string
- if the module is sophisticated, it will also spending time caching
the resulting query to avoid creating it again. It will also flexibly
cache it either to disk or memory using something like
Cache::Cache. The advantage of the static approach
(exemplified by SQL::Catalog) is that
the caching happens to either place before the application starts
- tuning it is possibly advisable to have a database expert
create well-tuned static (static meaning, the only variability is in
placeholders) queries as opposed to more general on-the-fly general
queries.
- task divisionagain, allowing the database expert to code up
the queries satisfying a specification on his own while the Perl
programmer accesses then purely by label, as exemplified in
SQL::Catalog provides the independance that large-scale
applications need.
The only thing lacking from the static approach
Is some sort of structure which maps a collection of cases to a
collection of SQL labels. I am thinking about Switch but also
about decision
trees.
Can anyone provide any pointers to a good means of such a mapping? Thanks.
Re: Eliminating Dynamic SQL...
by footpad (Abbot) on Jan 16, 2002 at 20:58 UTC
|
Is [there] some sort of structure which maps a collection of cases to a collection of SQL labels?
I think you have a very flexible, structured tool available, one that lets you create the exact results you're looking for. Specifically, your database.
Determine an initial set of queries and their most common variations. Create a set of tables that express these variations well, and then use queries, views, stored procs, or whatever you want to provide a run time interface that lets the user:
Choose the query they're looking for from the list of queries in your tables.
Select common variations (e.g. different values for the ORDER BY clause) from a subset related to the main query they chose.
Enter placeholder values as needed.
As you can probably tell, this requires multiple tables. However, they will likely be very tiny tables, so there should be few performance concerns.
Maintenance should be simplified because it then becomes a data editing task, rather than a source modification. If your DBA creates a view to replace a complex WHERE clause, then you should be able to incorporate using a simple UPDATE query. If your end-user's new query turns out to be an alternate ORDER BY clause, you INSERT it into the appropriate detail table. If you model this properly, your Perl code should be very simple and basic.
In short, consider treating this as a data problem, not a language/module problem.
--f
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by lachoy (Parson) on Jan 16, 2002 at 20:06 UTC
|
This is an interesting post. I agree with the second and third
issues you bring up (tuning and task division), although I think that
in many applications the third issue isn't sizable since it's the same
developer with a different hat on :-) But you're focusing on
large-scale applications so I understand.
Regarding the first issue: I'm curious if you've measured the time
needed to build the queries using the methods you describe here, or
others more complicated.
I like the general concept behind generating SQL statements
beforehand, particularly for queries that rely on multiple tables,
because it's easier to debug and to test. However, this seems to
conflict with the desire of many users to perform ad-hoc queries, even
if constrained by a simple web form. In cases like this, do you think
it's a good idea to generate all the queries beforehand? This
would seem to be a query-management nightmare since you have to
account for every combination of criteria.
This might lead to a hybrid system where often-used queries are
generated beforehand but ad-hoc queries are generated (partly at
least) at run/request-time. I'm thinking of including something like
this in SPOPS as it would add a new wrinkle to managing
object relationships.
Chris
M-x auto-bs-mode
| [reply] [Watch: Dir/Any] |
|
...generating SQL statements beforehand... seems to conflict with the desire of many users to perform ad-hoc queries, even if constrained by a simple web form.
It all depends on the size of the database, the type of queries, and the load that is placed on the database (i.e. the number of parallel instances of such ad-hoc queries).
Personally I've worked on several large systems (for example the now defunct eCircles.com site) where all access to the database is via pre-defined stored procedures (which is equivalent to pre-defined queries, really). Obviously it requires some thought to define the set of operations that users should be allowed to perform, but it has the huge advantage of separating the SQL from the application. If a query performs badly it can be tuned by the DBA (or the SQL developper) without touching the perl code (as long as the input and output stays the same, of course).
The queries can also be analyzed in isolation to ensure decent performance.
It is, IMHO, a must.
Michael
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by talexb (Chancellor) on Jan 16, 2002 at 19:23 UTC
|
I am not cure I have understood all of your meditation, but let's see if I can address the part titled Issues with runtime SQL generation .
In your example you show a select statement that grabs certain fields (@field) from a table if the CGI variable has been passed in, based on certain conditions (@where).
I suppose an attempt to optimize would have a BEGIN statement generate all of the possible permutations of @fields and @where. The load time (i.e. time to execute the BEGIN) and the memory required to store the results could be considerable, based on the size of each of the arrays.
About the only database tuning that I can think of for a select statement would be indexing on the contrained variables; in this example, weight would be an indexed column.
Caching queries might be possible and even quite efficient, but I don't know if that type of thing can easily be generalized. In this example of a simple select statement, it's likely easier to just generate the SQL and grab the data than to search through a collection of cached queries, based on which fields and constraints have been requested.
Is there a more complicated example to explore the issues that you're interested in pursuing?
--t. alex
"Of course, you realize that this means war." -- Bugs Bunny.
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by poqui (Deacon) on Jan 16, 2002 at 22:25 UTC
|
I agree with footpad, that this is a data problem. In Oracle, and especially for Data Warehousing, there is the concept of "Materialized Views" which is an attempt to solve this problem, partway between coding actual tables to hold queries and still allowing the users to enter any query they please. The (potentially) very long execution times for certain queries makes this a valuable approach, especially when the underlying data doesnt change very quickly (materialized views can be refreshed, but in my experience, the time cost is => original query).
Certain front end tools (like Brio) do have varying user levels as well, so that less experienced users have a set of canned queries to choose from (and these are stored in the Oracle database behind it), and the more experienced may enter their own queries.
This approach gives the kind of control a DBA craves, while still allowing a modest level of freedom for users.
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by perrin (Chancellor) on Jan 16, 2002 at 23:37 UTC
|
If you use prepare_cached(), you can avoid having to prepare the generated queries more than once or build DBI query objects for them. That's the only expensive part of generating SQL. As for the tuning and task division, I just try to minimize the variations in the generated SQL so that it can be optimized by a DBA once and then left alone. If necessary, you can dump out all the variations once and let your DBA test them all.
If you were to literally put every variation of SQL in your program, you would have a maintenance nightmare with tons of duplication in the SQL code. | [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by demerphq (Chancellor) on Jan 16, 2002 at 20:15 UTC
|
Hmm. Regarding your point 1. Id be pretty suprised if the issues in this point are non-trivial. In fact i'd be pretty suprised if 1.3 wasnt considerably slower than 1.2 for 99% of applications... (think file IO versus in memory string manipulation).
2 and 3 however are valid points that I will be putting some thought into with regards to my current project. Thanks.
Yves / DeMerphq
--
When to use Prototypes? | [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by ignatz (Vicar) on Jan 17, 2002 at 02:42 UTC
|
Very interesting meditation.
Is a satisfactory answer really possible? It often feels to me like relational database design and SQL are being pushed to the breaking point within applications.
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by simonm (Vicar) on Mar 28, 2002 at 07:00 UTC
|
Saying you're strongly against runtime SQL generation strikes me as similar to saying that you're against procedural programming... It leaves out the context of the problem that you're solving.
If you're building a script with a known number of queries, a static collection of queries may adequately address your needs, and if your queries are fairly complicated or high-volume, I can understand putting in the effort to optimize them.
However, if you're building a more open-ended system with various ad-hoc query capabilities, such as an OO-RDBMS mapper, you might find that pre-writing all of the queries one might possibly use is laborious, and properly selecting amongst them (the "only thing lacking", above) becomes an increasingly complex problem.
There must actually be a whole range of possibilities here, ranging from static, to mostly static with some dynamic subclauses, to mostly dynamic with a few static hand-optimized queries, to completely dynamic, and all sorts of other variations. Each of those techniques will be good at solving some kinds of problem, and less useful for others... And of course, the quality of the individual library's implementation, and the developer's level of comfort with the approach, are factors that are at least as important as whether the SQL is written in advance or not.
Without a discussion of the context -- what you're doing and why -- declarations of jihad against dynamic SQL (or against dynamic binding, or against Microsoft, or whatever) have a ring of hollow ideology to my ear.
| [reply] [Watch: Dir/Any] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by trs80 (Priest) on Jan 18, 2002 at 10:39 UTC
|
I think the dynamic creation of SQL is used if flexibility
is required. If the queries are always the same it would be
enough to store them in a list or hash and then access them
I worked on a concept once where all the SQL would be stored
in a file like so:
%sql_queries = (
1 => qq~SELECT blah FROM blah WHERE id = ?~,
2 => qq~SELECT blah,more_blah FROM blah WHERE id = ? and name = ?~,
);
Then for development the programmers would refer to the
hash and then access the right SQL them via the correct key.
Now this was just an idea that was tossed around and no real
project was developed with it, but the idea brings to point
the rules you talk about in order to "choose" the right query.
I would be interested in seeing your idea of how a "rules" file
would be structured for something like this. | [reply] [Watch: Dir/Any] [d/l] |
Re: Eliminating Dynamic SQL Generation with Decision Matrices for Tighter Large Scale Applications
by seanbo (Chaplain) on Jan 18, 2002 at 00:08 UTC
|
It's interesting that you bring this up. Over the past few weeks, I have been working on a web form that will allow my users to select some parms from a webpage and genereate a set of graphs based on their selections.
Aside from needing to create some indexes to speead the inital load of the page and selection boxes, I previously, hadn't had any concerns about coding in the manner I have. Now, I am concerned that I should rethink my position.
Just to give some background. I have some input in the initial db design, but it needs to be setup in a manner where there is really no maintenance by the users. Once I turn it over to them, it's theirs. I never ever see it again. It will be fed by a .csv file once per month.
How would some of you propose I alter this code?
The resultant webpage is here.
As a side note...I am currently devloping it using MySQL, but the backend db will become MS Access.
perl -e 'print reverse qw/o b n a e s/;' | [reply] [Watch: Dir/Any] |
|
|