Re: Moving SQL from perl to Stored Procedures

I disagree with calling this a best practice without qualifiers or more details.

There are a lot of arguments on both sides when it comes to moving logic in and out of the database. Either decision is very defensible. In a particular situation it may make sense to choose one over the other, but I do not believe that either choice is universally better.

Furthermore if you're going to move logic into the database, then there are additional things you need to do as well. I think that if you're saying that you should do the first, you should at least mention the second.

What are some of the trade-offs?

Against keeping logic in databases:

Who is developing your application? Probably programmers. What are they best at? Probably their application language (given this site, likely to be Perl). Why make them program in another language?
You present SQL injection attacks as a key problem to solve. That has many solutions. For instance you can use placeholders. Or you can use an object relational manager such as Class::DBI.
Stored procedure languages tend to be quite bad. Business logic etc can often be far more clearly done outside of the database than inside the database.
Having significant logic in stored procedures makes code releases significantly more complex.
If multiple programmers are working on logic in stored procedures in the database, they are more likely to accidentally step on each other's toes.
If there is any chance that you'll need to change databases, the more logic in the database, the worse it is. Very relevant to this point is that proprietary database vendors actively attempt to charge you as close to your cost of switching databases as possible. Those negotiations may go better for you if you really can switch databases easily. (See Information Rules for an explanation of why they do this.)
The database tends to be the hardest part of your system to scale. Verifying logic takes work for your database, and may cause it to run into problems sooner. (Note "may", scalability can go either way.) Yes, I know that database vendors claim that it is more scaleable to put logic in the database. It is also in their financial interest that you do so...
As the anonymous monk pointed out above, when you leave some logic in the application and move other logic into the database, you are tightly coupling two piece of code that are very far from each other. In general, this is an unwise thing to do.

For putting logic in databases:

Security. While SQL injection attacks can prevented in other ways, if someone manages to break into your webserver in another way, having the database locked down makes it more difficult for them to accomplish.
When multiple applications access the same database (one of those "applications" may be ad hoc queries), you can't guarantee that they will all implement the same business logic unless it is in the database.
Performance. It is often possible to code business logic to run faster in the database than it will outside. The biggest win being due to avoiding unnecessary round trips between machines. Some may wonder how this squares with my scalability point above. To understand it, remember that scalability is like a Mack truck - a sports car may get your groceries home faster, but the truck can move more groceries.
Avoid race conditions. When logic is out of the database, you sometimes have to choose between accepting a race condition and holding a lock that can be a scalability hit. (See my comment before on scalability going either way...) The same locking done within the database is less of a scalability issue because you do away with the latency of round trips between the database and the client.

Now looking at those lists, what stands out? The first thing that stands out for me is that there is no single "killer" point that makes it obvious that one is always the right way to go and the other is not. The second is that the list of reasons against logic in the database is longer than the list of reasons to put it there. The third is that the reasons against putting logic in the database are mostly addressable, while the benefits of putting it in the database are mostly things you're stuck with.

Therefore my conclusion is that, depending on the project, either choice can be reasonable. But if you choose to put logic in the database, then you should address the problems that can arise. Here is a short list of things that I think you should do.

Give each developer their own database. That way they won't get in each other's way as they are developing.
Create automated procedures for synchronizing your database with what is in source control. This will help take the pain out of setting up and maintaining per developer databases, and those procedures can become part of your release process, addressing that issue.
Make it very clear in your application design what the line is between what belongs in the database and what does not. Be very vigilant in maintaining that line. That way you'll limit the coupling issues.
Make sure that your developers understand enough about how to work with stored procedures to do what they need to do. They don't need to be experts, but they should be able to understand what a procedure does.

If you don't do at least some (and probably most, if not all) of those things, then I think it is a bad idea to move logic into the database. If you do all of those things and you actively need the benefits from moving your logic there, then you are probably making a wise choice.

Comment on Re: Moving SQL from perl to Stored Procedures

Replies are listed 'Best First'.
Re^2: Moving SQL from perl to Stored Procedures by BrowserUk (Patriarch) on Sep 10, 2006 at 02:14 UTC
The first thing that stands out for me is that there is no single "killer" point ... The killer points for me in favour of stored procedures over embedded SQL are: Relational logic and database design is hard to master. A good DBA (a rare breed), can save the company their salary many times over. Very few general programmers become really adept and DB design. Applications need only the results, not how to obtain them. DB design transcends applications for all but the most trivial. A DBA can provide cross-application overview and arbitrate between their disparate requirements. Application programmers are usually to caught up in their current applications requirements to step back and subjugate their perception of their applications needs in favour of another applications with such arbitration. Isolating your DB design from your applications, and vice versa, is a sound application of the decoupling principle at so many levels. New applications come on line that require changes to the DB schema. Other applications should not have to change to accommodate this. The stored procedures are modified to account for the new schema and existing applications continue to run without change. Business code and data should live together and be application independent. Tax rates, depreciation terms and a whole gamut of other typical business calculations, rules and logic live outside of the applications remit, and are subject to change by government edicts and legislation. When they change, they change across the board for all affected applications. Changes should be implemented and tested once, not in every application. You do not realise the true benefits of a RDBMS until all your business data exists in a single database. It is only once it is possible to perform relational logic across the entire range of company data assets that they really begin to earn their keep. Warehousing solutions where the data from individual databases is replicated wholesale into secondary conglomerations miss opportunities for ad-hoc queries that the conglomeration process does not support. Similarly, warehousing that attempts to perform cross DB boundary searches via meta-data and meta searching are rarely, if ever, capable of keeping up with changes and accommodating speculative ad-hoc queries. There is usually a fairly substantial delay between the notion of a cross-DB connection query being useful and it becoming available; and a not inconsiderable expense in making it available, which requires a costed business case to be made before budget can be allocated. Those delays, costs and red-tape suppress discovery. No web programmer worth his salt would advocate the mixing of Perl and HTML. Why advocate the mixing of Perl and SQL. The obligatory analogy. GP's diagnose, prescribe and manage year on year health care, but they defer specialist treatment to specialists. Eye, brain, heart specialists etc. etc. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Moving SQL from perl to Stored Procedures by tilly (Archbishop) on Sep 10, 2006 at 05:43 UTC
Colour me unconvinced. Relational logic and database design is hard to master. That depends what you mean by "master". But a good working understanding of it is within the grasp of any good programmer. If you are working on an application that heavily depends on a relational database, then I'd consider it your responsibility to gain that understanding. This doesn't mean that you'd replace a DBA. There is a lot of detail about, for instance, when to use what kind of database and how to physically lay tables out which developers should not be expected to understand. But a developer should be able to understand how to structure data in a database, and how to access that data. Applications need only the results, not how to obtain them. That's a wonderful theory. It is not so wonderful in practice. If there is an impedence mismatch between how the application works and how the database works, that mismatch is going to be a constant source of problems. As Joel Spolsky put it, abstractions leak. And the more that the abstraction tries to look different than the underlying reality, the more leakage there is. Therefore for a straightforward CRUD application, the database design dictates a lot of how the application should naturally be designed. (Which is one reason why it is good for developers to understand enough about database design to be able to understand that.) Business code and data should live together and be application independent. The way that you have put this presupposes that there are many independent applications accessing the same data, and those applications are tied to some bigger business. That was one of the situations where I stated one should move logic into the database. However that presupposition is often not the case. For instance consider an application like RT. RT sets up a private database as part of the installation. It is very reasonable for it to assume that it is the only thing accessing that database. If you break this assumption, that is your problem. It is also not the case that RT's design depends in any way on the corporation where it has been installed. So your presupposition completely fails in the case of this application. (Note: RT gives you lots of ways to export its data, and APIs to manipulate RT. Integrating RT's data into other databases is not hard.) You do not realise the true benefits of a RDBMS until all your business data exists in a single database. You do not realize the true limitations of a RDBMS until all your business data exists in a single database and that database melts down. I don't know what you do, or who you do it for. However I happen to handle reporting needs for a high performance website. And I to know enough about databases to understand the difference between a transactional database and a data warehouse. If you attempt to run a large query against our transactional database, you will generally fail. Data is changing too fast, and before you get very far you'll get a snapshot too old. (If you guessed from that error that I'm talking about Oracle, give yourself a gold star.) But even if you go against the data warehouse, many ad hoc queries are simply not directly feasible. If you have experience running complex queries against billions of rows of data, you'll understand. Reality is far short of the ideal of just putting everything into a database and letting it take care of the details. Now you take a cheap shot about mixing Perl and HTML. But if you go back and look at what I've recommended, I never did that. I am talking about how much logic is pushed into the database. If you say, "very little", that doesn't mean that you're will randomly mix Perl and SQL. Instead in a sensibly designed application, you'll have an application layer to handle that interface. The role of that application layer is conceptually similar to a templating system - it creates a bridge from Perl to something else that allows you to avoid mixing the two. I even suggested one possible option for how to build that application layer, namely use Class::DBI. In fact you'll want that layer no matter where you choose to put all of that logic. There has to be some sort of interface, and you ideally want that interface to be as cleanly defined on each side as possible. That is basic common sense. What is not is how much logic to put where.	[reply]
Re^4: Moving SQL from perl to Stored Procedures by BrowserUk (Patriarch) on Sep 10, 2006 at 18:12 UTC
Relational logic and database design is hard to master. That depends what you mean by "master". But a good working understanding of it is within the grasp of any good programmer. Without for a moment doubting your grasp of relational theory and practice, I think that this is a pervasive--and generally wrong--assumption. Here's another quote from Joel Spolsky. Funnily enough from the same article. The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster. A DBA, familiar with the type and version of the RDBMS he is using, can be expected to not only know this kind of detail, but also when it changes with releases/updates to the installation. And, when those changes occur, to scan those queries within his domain of responsibility looking for those likely to be affected by the change and schedule testing for possible improvements. Even if the original application development team has moved on, and even the original DBA(s), those queries in existing applications that may be affected by upgrades are within his remit. Queries embedded within application code are not. There are many similar areas where a DBA can be expected to have specialist knowledge beyond the scope of your average AP. If there is an impedence mismatch between how the application works and how the database works, that mismatch is going to be a constant source of problems. As Joel Spolsky put it, abstractions leak. And the more that the abstraction tries to look different than the underlying reality, the more leakage there is. The keyword there is If. Why would there be an impedance mismatch? Or rather, why is this any more likely with SPs? Whatever API you wrap around your embedded SQL (eg. ... one possible option for how to build that application layer, namely use Class::DBI.), can equally be wrapped around calls to SPs. As you said later: There has to be some sort of interface, and you ideally want that interface to be as cleanly defined on each side as possible. The reasons for putting the business logic into SPs are: You remove the dependency of the application layers from the DB schema. You allow for the schema to accommodate cross-application requirements (where applicable). You allow the application to benefit from expert maintenance in the specialist areas of DB design, proprietaries, mid-life changes and updates. I see no extra scope for being vulnerable to Joel's The Law of Leaky Abstractions, and having just re-read the article and the context in which you have quoted from it, I wonder if maybe you misunderstood him? The way that you have put this presupposes that there are many independent applications accessing the same data, ... I did say "DB design transcends applications for all but the most trivial.". Obviously, standalone applications are not affected this way, so using a ~~trivial~~ standalone application as counter argument is "straw man". I don't know what you do, or who you do it for. However I happen to handle reporting needs for a high performance website. And I to know enough about databases to understand the difference between a transactional database and a data warehouse. If you attempt to run a large query against our transactional database, Three points: What I do (or have done) and who I do (did) it for are irrelevant. Let's not get into a pissing contest about experience here. Either the argument stands up or it doesn't. I said "single database". I didn't say "only database". There's nothing wrong with placing transaction tables in a separate database and having them replicated back to the all encompassing offline DB. The point is that in order to allow the tables be replicated into the offline DB and permit ad-hoc investigative queries in conjunction with other application data there, the table designs need to be consistent with the wider schema. Too often, datawarehousing only brings together a subset or summary representation of the live data, which precludes or inhibits benefits be derived from cross-application tie-ins. Not all applications need to realise the full potential of the RDBMS. Now you take a cheap shot about mixing Perl and HTML. I didn't and don't see it as a cheap shot. I actually thought you would appreciate that reasoning as I know you to be an advocate of decoupled interfaces. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^5: Moving SQL from perl to Stored Procedures by tilly (Archbishop) on Sep 12, 2006 at 19:29 UTC
Re^2: Moving SQL from perl to Stored Procedures by imp (Priest) on Sep 10, 2006 at 02:24 UTC
Excellent points tilly - thank you for the feedback. I agree that stored procedures do not fit every situation, and that there are new issues that arise when you decide to embrace that solution - the biggest one for me is coupling. That said, there are many situations where they are the best solution available. The intent for the original post was not to suggest that stored procedures are the best practice by the way - it was to discuss best practices when using stored procedures.	[reply]