Magical SQL

Idea co-author: asdfgroup.
I will explain this idea on CGI example, but this idea is not only for CGIs.
Did you seen before code like this one:

my %Q = CGI::Vars();
...
$id_cust = AddCusomer($Q{login}, $Q{pass}, $Q{name}, $Q{email}, ...)
    if $Q{pass} eq $Q{confirmpass};
...
sub AddCustomer {
    my ($login, $pass, $name, $email, ...) = @_;
    $dbh->do("INSERT INTO Customer SET login=?, pass=?, name=?,
        email=?, ...", undef, $login, $pass, $name, $email, ...)
        or return;
    $dbh->selectrow_array("SELECT LAST_INSERT_ID()");
}
[download]

How many times word "login" repeated? I think: too many! How about code like this one:

my %Q = CGI::Vars();
...
$id_cust=$dbh->Insert("Customer", \%Q) if $Q{pass} eq $Q{confirmpass};
[download]

Huh? :-) And now I explain how this work, how keep it secure (this is very important, becouse %Q defined by user in CGI params), simple and usable.

We found simple and clean algorithm how to construct SQL queries from table name and hash with fields. Of course, this is possible now only for simple SQL queries. But our survey results that 98% of SQL queries in typical CGI are simple and can be executed magically with our algorithm.

In short, the algorithm is:

Analize database and tables with DESC query. This give us all field names and all primary key names for all tables.
Analize given %Q and ignore all keys which not equal to field names in given table "Customer".
From other keys construct requested SQL query:
- Insert() INSERT INTO Table SET "key=value" joined on ","
- Update() UPDATE Table SET "nonprimarykey=value" joined on "," WHERE "primarykey=value"
- Replace() REPLACE Table SET "key=value" joined on ","
- Delete() DELETE FROM Table WHERE "key=value" joined on "AND"
- ID() SELECT primarykey FROM Table WHERE "key=value" joined on "AND"
- Count() SELECT count(*) as __count FROM Table WHERE "key=value" joined on "AND"
- Select() SELECT * FROM Table WHERE "key=value" joined on "AND"
Execute query (for Insert() execute additional query: "SELECT LAST_INSERT_ID()").

Example:

# CREATE TABLE Customer (
#       id_cust int auto_increment primary key,
#       login varchar(255) not null,
#       pass varchar(255) not null,
#       name varchar(255) not null,
#       email varchar(255) not null,
#   ) ;

# INSERT INTO Customer SET login='powerman', pass='11111',
#   name='Alex', email='root@localhost' ;
# SELECT LAST_INSERT_ID() ;
%Q = (
    login   => "powerman",
    pass    => "11111",
    name    => "Alex",
    email   => "root@localhost",
    BADKEY  => "QWEQWE",
    );
$id_cust = $dbh->Insert("Customer", \%Q);

# UPDATE Customer SET pass='22222', name='Alex Efros'
#   WHERE id_cust=1 ;
%Q = (
    id_cust => 1,
    pass    => '22222',
    name    => 'Alex Efros',
    );
$dbh->Update("Customer", \%Q);

# REPLACE INTO Customer SET id_cust=1, pass='33333' ;
$dbh->Replace("Customer", {id_cust=>1, pass=>'33333'});

# DELETE FROM Customer WHERE login='powerman' AND pass='33333' ;
$dbh->Delete("Customer", {login=>"powerman", pass=>"33333"});

# SELECT id_cust FROM Customer WHERE login='powerman' AND pass='33333'
+ ;
$id_cust = $dbh->ID("Customer", {login=>'powerman', pass=>'33333'})

# SELECT count(*) as __count FROM Customer
$count = $dbh->Count("Customer");

# SELECT * FROM Customer
@all_customers = $dbh->Select("Customer");

# SELECT * FROM Customer WHERE id_cust=1 ;
$Cust = $dbh->Select("Customer", {id_cust=>1});
[download]

But these queries are not simple, they are too simple. How about more realistic queries? No problem! ;-) At this time our algorithm can produce queries like this one:

# CREATE TABLE Cust (
#       id_cust int auto_increment primary key,
#       id_country int,
#       age int,
#       join_date datetime,
#       _balance decimal(8,2),
#   );
#   CREATE TABLE SelectedGoods (
#       id_sg int auto_increment primary key,
#       id_cust int,
#       id_goods int,
#   );
#   CREATE TABLE Goods (
#       id_goods int auto_increment primary key,
#       goods varchar(255) not null,
#       cost decimal(8,2) not null,
#   );
#   CREATE TABLE Country (
#       id_country int auto_increment primary key,
#       country varchar(255) not null,
#   );

# SELECT
#   Cust.id_cust, Cust.id_country, Cust.age,
#   Cust.join_date, Cust._balance,
#   SelectedGoods.id_sg, SelectedGoods.id_cust,
#   SelectedGoods.id_goods,
#   Goods.goods, Goods.cost,
#   Country.country
# FROM
#   Cust
#   INNER JOIN SelectedGoods ON (Cust.id_cust = SelectedGoods.id_cust)
#   INNER JOIN Goods ON (SelectedGoods.id_goods = Goods.id_goods)
#   INNER JOIN Country ON (Cust.id_country = Country.id_country)
# WHERE
#   Country.country =  'Ukraine'    AND
#   Cust.age        >= 25           AND
#   Cust.join_date  >= DATE_ADD(NOW(), INTERVAL -1 DAY) AND
#   Cust._balance   >  5.50         AND
#   Goods.goods     IN ('Monitor LG','CDROM 52x')
# ORDER BY
#   Cust._balance DESC,
#   Cust.age ASC ;
%Q = (
    country             => 'Ukraine',
    age__ge             => 25,
    join_date__date_ge  => "-1 DAY",
    _balance__gt        => 5.50,
    goods               => ["Monitor LG", "CDROM 52x"],
    -order              => ["_balance DESC", "age"],
    );
@search_result = $dbh->Select(["Customer", "SelectedGoods",
    "Goods", "Country"], \%Q);
[download]

To make this possible we define some conventions:

Field names start from /^a-z_/ and can't contain "__"
Tables joined by "INNER JOIN" (see below)
First field in every table is a primary key and primary key consist of only one field

and you can use these magical functions to access tables which support these conventions, and use standart DBI methods for other tables or for queries too complicated for these magical functions.

Additional features which we add to these magical functions are:

If table contain field _cdate, this field will be automagically set to NOW() when you Insert() or Replace() in this table.
If table contain field _mdate, this field will be automagically set to NOW() when you Insert(), Update() or Replace() this table.

Security
Now we have one command: $dbh->Update("Customer", \%Q); and this command can update ANY fields in table "Customer" (remember, all CGI params stored in %Q ?). And fields to update with values set by user running this CGI. Not good, yeah? If customer with id_cust=123 execute our CGI this way:
update_cust.cgi?id_cust=123&_balance=1000000
or this way:
update_cust.cgi?id_cust=1&_balance=0
Yes, right, this customer set own balance to 1000000 and set balance for other customer with id_cust=1 to 0. Terrible!

We solve this problem this way. At first, we do not accept CGI params started from "_":

%Q = CGI::Vars();
delete $Q{$_} for grep {m/^_/} keys %Q;
[download]

And we name all "need to protect" fields in table from "_" (like _balance). But, ... how to set balance when needed? For example, ADMIN _can_ set balance for any USER... No problem. Just do something like this before Update():

$Q{_balance} = $Q{Set_balance} if $Admin and $Q{Set_balance};
$dbh->Insert("Customer", \%Q);
[download]

Ok, but how about accepting id_cust=1 from customer 123 ? Again, no problem. Just do this way ($CustID we have from session):

$Q{_balance} = $Q{Set_balance} if $Admin and $Q{Set_balance};
$dbh->Insert("Customer", {%Q, id_cust=>$CustID});
[download]

We found this sort of "security checks" very good becouse programmer concentrate on important special fields like id_cust and _balance instead of typing "login", "email", and other basic field names many many times.

Comment on Magical SQL Select or Download Code

Replies are listed 'Best First'.
Re: Magical SQL by lachoy (Parson) on Apr 26, 2002 at 12:36 UTC
A couple of brief notes: 1. This seems MySQL-specific: not all databases have `INNER JOIN` or `LAST_INSERT_ID`. So you wind up building abstractions for these and bingo! you've reinvented Alzabo, SPOPS or many other SQL abstraction tools. 2. I agree with one of the previous posters that having column-level security is a bad idea -- it sounds like a good idea, but when you think about it, is it an idea that you really need? I implemented something with this four years ago. It worked, but it was a nightmare. And not just implementation-wise -- it's difficult for people to understand as well. 3. I have had quite a bit of experience in creating SQL from either metadata or parameters (see SPOPS, for starters). Over that time I have come to the conclusion that it's better to use straight SQL for creating non-straightforward queries. Why? For all the bitching about SQL, it's a standard. It's been around for years. There are large books and web tutorials written about it. In short, people know it. And your metatadata or parameters for creating these queries winds up being more confusing and less maintainable than the SQL it's supposed to generate. Sure people can learn how the metadata or parameters work, but isn't it a better idea to leverage the knowledge people already have about SQL? Just my 2c, etc. Chris M-x auto-bs-mode	[reply] [d/l] [select]
Re: Re: Magical SQL by ignatz (Vicar) on Apr 26, 2002 at 13:27 UTC
AMEN Often SQL is the only liferaft one can find when sifting through foreign code. ()-() \"/ `	[reply]
Re: Re: Re: Magical SQL by perrin (Chancellor) on Apr 26, 2002 at 15:34 UTC
ignatz++. Sing it, brother. Testify.	[reply]
Re: Re: Magical SQL by powerman (Friar) on Apr 26, 2002 at 13:27 UTC
Alzabo and SPOPS are huge! I don't want use them or reinventing them. Remember: "Keep it Simple, Stupid"! But I agree about MySQL-specific. This is not good, but I prefer simple and fast code instead of compatible with all world. So, if I can make it portable and non MySQL-specific without make it slow and complicated, i do it. And I don't try to make some sort of abstraction layer for any type of SQL queries. Non-straightforward queries MUST be done using straight SQL, and I say it already in main post. This idea is a lazy automagicall replacement only for simple SQL queries. And as I write before our survey show what >95% of used SQL queries in typical CGI scripts ARE simple and can be done using this interface. *In fact, size of my CGIs now 50% less with this interface!* And, please, explain what do you mean as column-level security?	[reply]
Re (3): Magical SQL by lachoy (Parson) on Apr 26, 2002 at 14:52 UTC
1. Alzabo and SPOPS are large because they implement a lot of functionality across different databases. (And in the case of SPOPS, across different datasources like LDAP.) They're also large because they handle many of the outlying cases that any homebrewed SQL generation mechanism will eventually have to deal with if it's used for a non-trivial application. Most large modules like these also ensure that you don't need to know everything about them to start using them. There are huge swaths of SPOPS that most people never need to care about. All they need to do is feed a configuration to a particular method and they can start accessing their database using straightforward methods. This doesn't prevent you from writing your own more complicated access, but it takes care of the common 90% usage. The KISS principle is excellent, but difficult problems often call for larger solutions. The fact that entire companies and productlines have been built on simplifying database access tells me this is a difficult problem. 2. That's fine if you're keeping this MySQL-specific. You might want to mention it :-) 3. As for non-straightforward queries using straight SQL, maybe we have different definitions for "non-straightforward". When I see something like: `%Q = ( country => 'Ukraine', age__ge => 25, join_date__date_ge => "-1 DAY", _balance__gt => 5.50, goods => ["Monitor LG", "CDROM 52x"], -order => ["_balance DESC", "age"], );` [download] This screams complicated. Based on my own painful excursions into generating complex SQL from this sort of information, I see the temptations that this can offer and am trying to warn you away from them. 4. Column-level security is when only certain users can update certain columns. Your statement that "ADMIN _can_ set balance for any USER..." leads me to believe you're doing this. Apologies if I was mistaken. Don't get me wrong, I'm not saying that what you're doing stinks. If it works for you and makes your code easier to maintain, fantastic. Good luck. Chris M-x auto-bs-mode	[reply] [d/l]
FIELD__func by powerman (Friar) on Apr 26, 2002 at 15:23 UTC
Re: Magical SQL by Juerd (Abbot) on Apr 26, 2002 at 10:55 UTC
You are announcing code without telling where the code can be found. That isn't really useful. If you want database abstraction, the ultimate DBI SQL abstracter is DBIx::Abstract. Adding per-column update privileges in the database seems like a bad idea to me, I'd just never give my argument hash directly to any database interface. Security isn't only about which fields you can or cannot update, you must also consider invalid values and insecure values. And when parsing/checking that, you might as well build yourself another hash while you're at it. - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply]
Re: Re: Magical SQL by powerman (Friar) on Apr 26, 2002 at 11:24 UTC
1) I'm not announcing code, I'm announcing only idea and some examples about how this may look like in real code. I have at this time code which realize this idea, but this code is not ready for publishing at this time (it was written in ~6 hours and commented only in Russian) and most part of this code WILL be changed in near future becouse we work up hardly on idea itself, functions interface and so on. 2) Checking for invalid and insecure values is out of scope of this idea and must be done as usual before calling Update(). I don't want waste code with building another hash with many not important fields. This is ugly. See this example: `# ... here user-supplied values was checked # variant 1, selecting all fields %newQ = ( id_cust=>$CustID, login => $Q{login}, pass => $Q{pass}, emai +l => $Q{email}, addr1 => $Q{addr1}, addr2 => $Q{addr2}, phone => $Q{p +hone}, ........... ); # variant 2, force only required fields %newQ = (%Q, id_cust=>$CustID);` [download] I think my way is right and secure becouse: Really required fields like login and pass will be checked before calling Update(), and if user do not send values for these fields Update() will not be called. If field like addr1 is not required, does not matter send user &addr1= or not if in table this field declared as NOT NULL. Variant 2 mean user can update any users's own field. This is flexible and secure. Update: 3) About DBIx::Abstract - my inferface is more suitable for lazy programmers becouse it is designed to parse all required information from single hash in form, which can be supplied by user running CGI. And for best security this hash NEVER contain part of SQL queries like "DATE_ADD(NOW(), INTERVAL ? HOUR)". Such type of queries can be done with help of special field names like FIELDNAME__date_add.	[reply] [d/l]
Allow override by Anonymous Monk on Apr 26, 2002 at 11:38 UTC
I think this is good, in fact I've just done something like it, but with one extra feature: it is a class, so for a particular application it can be overriden. Here is the routine that makes an input field in a form: # print an input element for a field, when replacing a record sub replaceField { my ($this, $name, $value, $type, $isKey) = @_; # check for special handler in subclass my $h = "replace_$name"; if ($this->can($h)) { $this->$h($name, $value, $type, $isKey); return }; # default processing my $attr = ''; # special attributes for tag if ('_' eq substr $name,0,1 ) {$attr .= 'READONLY ';}; print qq($name: <input name="$name" class="$name" value="$value" $attr type="text" ></input>); }	[reply]
Re: Magical SQL by mpeppler (Vicar) on Apr 26, 2002 at 15:09 UTC
Tempting as it is I don't much like this sort of interface. The advantage is perceived simplification of writing the CGI code (or other front-end code). In reality it has a tendency to obscure what you are doing, and in particular when the next person comes and needs to debug your code there ends up being a lot of magic going on. Personally I much prefer mapping a database action (say, insert a newe user record) to a perl subroutine. This adds one level of perl subroutine calls, but makes the code a lot easier to read and understand, and isolates the application code from the database. Michael	[reply]
Re: Magical SQL by abstracts (Hermit) on Apr 26, 2002 at 10:37 UTC
Hello This is a request for all people who're down voting this node to please consider contributing the reasons for the down voting. Are there security problems not addresses? Performance issues? Your contributions would help us all. Update:: Sifmole and fellow Anonymous Monk: Please check who you're posting the reply to. I'm not the one who posted the original node, so, please redirect your comments.	[reply]
Re: Re: Magical SQL by Sifmole (Chaplain) on Apr 26, 2002 at 12:15 UTC
Okay -- Annoucing code, then leaving it out is just annoying. If the code isn't completely commented, post a draft. If what you have is not even able to be posted as a draft -- the hold off on posting a little bragfest until you have something to show.	[reply]
Code, code, code... by powerman (Friar) on Apr 26, 2002 at 12:45 UTC
Again, I'm announcing IDEA, not code!!! If even perl monks don't see the difference.... :-\ Where I can post an idea without code if not in Meditations??? I hope you help me find important weakness of this idea before it will be realized and released! Becouse after release make significant changes are much more difficult - these changes must be compatible with previous version! I'm awaiting from you questions like "is it possible to do ... with this interface?" or "how will be solved ... problem with security", etc. Or, at least, replies like this "I don't like this idea becouse ...". You can call me "romantic"... ;-)	[reply]
Re: Magical SQL by dws (Chancellor) on Apr 26, 2002 at 18:13 UTC
One of the tradeoffs this scheme makes is simplicity against database overhead, in the form of an extra query to get table descriptions. There are some environments where this won't fly. Those environments are characterized by being sufficiently database intensive that they have a separate database group responsible for physical layout and performance. If load on the DB is getting critical oversight, deploying a scheme that uses an extra table description query per CGI invocation will sooner or later get you a visit from the DB police. Your example suggests that you're running your own MySQL, so you're probably O.K.	[reply]
Extra queries by powerman (Friar) on Apr 26, 2002 at 18:51 UTC
Yes, we know about this. And current solution is create file with current database structure and use this file instead of DESC queries. Database structure seldom changed when project is ready, and after these seldom changes recreate file with database structure is not very difficult.	[reply]
Re: Extra queries by autarch (Hermit) on Jun 19, 2002 at 02:38 UTC
Yeap, Alzabo does this too. Wheels ... reinventing ... But hey, it's your choice. You can write your own tool which may or may not get used by others, to which you'll have to add all your own new features, which you'll have to debug. And you _will_ add more features. I guarantee it. When I first started Alzabo, I didn't need feature X. Then later I did, and I added it. Or someone else asked for it. It's a better tool for it, but it is big. But it's closer to _complete_ than anything anyone could whip up from scratch (unless scratch means 2.5 years of development!). Or you can use an existing, supported tool which has other users and for which you can simply submit bug reports, like say Alzabo, SPOPS, or Class::DBI.	[reply]