Re^4: Moving SQL from perl to Stored Procedures

Relational logic and database design is hard to master.
That depends what you mean by "master". But a good working understanding of it is within the grasp of any good programmer.

Without for a moment doubting your grasp of relational theory and practice, I think that this is a pervasive--and generally wrong--assumption.

Here's another quote from Joel Spolsky. Funnily enough from the same article.

The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.

A DBA, familiar with the type and version of the RDBMS he is using, can be expected to not only know this kind of detail, but also when it changes with releases/updates to the installation. And, when those changes occur, to scan those queries within his domain of responsibility looking for those likely to be affected by the change and schedule testing for possible improvements.

Even if the original application development team has moved on, and even the original DBA(s), those queries in existing applications that may be affected by upgrades are within his remit. Queries embedded within application code are not.

There are many similar areas where a DBA can be expected to have specialist knowledge beyond the scope of your average AP.

If there is an impedence mismatch between how the application works and how the database works, that mismatch is going to be a constant source of problems. As Joel Spolsky put it, abstractions leak. And the more that the abstraction tries to look different than the underlying reality, the more leakage there is.

The keyword there is If. Why would there be an impedance mismatch? Or rather, why is this any more likely with SPs?

Whatever API you wrap around your embedded SQL (eg. ... one possible option for how to build that application layer, namely use Class::DBI.), can equally be wrapped around calls to SPs. As you said later:

There has to be some sort of interface, and you ideally want that interface to be as cleanly defined on each side as possible.

The reasons for putting the business logic into SPs are:

You remove the dependency of the application layers from the DB schema.
You allow for the schema to accommodate cross-application requirements (where applicable).
You allow the application to benefit from expert maintenance in the specialist areas of DB design, proprietaries, mid-life changes and updates.

I see no extra scope for being vulnerable to Joel's The Law of Leaky Abstractions, and having just re-read the article and the context in which you have quoted from it, I wonder if maybe you misunderstood him?

The way that you have put this presupposes that there are many independent applications accessing the same data, ...

I did say "DB design transcends applications for all but the most trivial.". Obviously, standalone applications are not affected this way, so using a ~~trivial~~ standalone application as counter argument is "straw man".

I don't know what you do, or who you do it for. However I happen to handle reporting needs for a high performance website. And I to know enough about databases to understand the difference between a transactional database and a data warehouse.
If you attempt to run a large query against our transactional database,

Three points:

What I do (or have done) and who I do (did) it for are irrelevant. Let's not get into a pissing contest about experience here.
Either the argument stands up or it doesn't.
I said "single database". I didn't say "only database".
There's nothing wrong with placing transaction tables in a separate database and having them replicated back to the all encompassing offline DB. The point is that in order to allow the tables be replicated into the offline DB and permit ad-hoc investigative queries in conjunction with other application data there, the table designs need to be consistent with the wider schema.
Too often, datawarehousing only brings together a subset or summary representation of the live data, which precludes or inhibits benefits be derived from cross-application tie-ins.
Not all applications need to realise the full potential of the RDBMS.

Now you take a cheap shot about mixing Perl and HTML.

I didn't and don't see it as a cheap shot. I actually thought you would appreciate that reasoning as I know you to be an advocate of decoupled interfaces.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^4: Moving SQL from perl to Stored Procedures

Replies are listed 'Best First'.
Re^5: Moving SQL from perl to Stored Procedures by tilly (Archbishop) on Sep 12, 2006 at 19:29 UTC
We disagree on what a good working understanding is. By that I don't mean enough to know how to optimize corner cases, identify bugs, or handle truly advanced concepts. By working understanding I mean enough to come up with sensible designs, sensible layouts, and construct working queries that mean what you think they mean. After that if you need to optimize and don't know enough to do it, then you can bring in the expert. If you have a good database and have shown basic common sense, this is generally a fairly straightforward process. That's true in general. But it is even more true with databases. Because a good DBA working with a semi-decent database doesn't even need to see your code to find the bad queries! How do they accomplish this miracle? Why quite simply, the database has profiling tools to tell the DBA where it is spending its time, and bad queries will show up as one of the top running statements. Now that the DBA is armed with the actual query that is misbehaving, it is a fairly simple matter for them to come to the application developers and say, "Find me where you are generating this query." A few invocations of grep later, and you're in a position to start analyzing it. The Pareto principle tells you that you only need to do a little bit of this optimization to clear out the big offenders. Practice bears this theory out. Incidentally I've found that doing this kind of optimization requires application programmers and DBAs to cooperate. DBAs and application programmers each bring something to the table. A DBA might notice, for instance, that a better index would help. Alternately the application programmer might add a caching layer to reduce how often they are querying the database. Optimization is an argument for cooperation, not an argument for making one side or the other responsible for the problem. Now the next point is that you are big on having multiple applications accessing the same database. I pointed out one example of an application that can reasonably assume it is the only one accessing its database, namely RT. It can assume that because it installs a database as part of its installation. You dismiss that as "trivial". Well most programming is actually trivial. And there are a lot of programmers who are employed in producing "trivial" applications. Don't believe me? Look at the popularity of MS Access as a development platform! Just because something is trivial doesn't mean that it is unimportant. (History note. Back in the 90's, O'Reilly had a frustrating problem. Everyone was producing bad CGI books and making a fortune. O'Reilly had all of the top Perl expertise, and nobody wanted to do a CGI book! O'Reilly knew it would sell well, but to all of the good Perl programmers, the web was "trivial". Take in parameters, spit out text. Yeah, as a programming exercise this is trivial, but it is still important...) But your claim that only trivial applications can assume they are the only ones accessing the database is wrong in more ways than that. Because I only named RT as an example that you are likely to be familiar with. There are lots of other reasons why only the application will access the database. Another example is a high performance website. A high performance website, by nature of what it does, stresses databases to their limits. In order for it not to fall over, you wind up using every trick you can to make things go smoothly. You said, DB design transcends applications for all but the most trivial. Well then, high performance websites are among the "most trivial" applications, because any serious mismatch between the application design and the database layout will cause a ton of extra load that will be a huge problem. In fact not only do high performance websites routinely assume they are the only ones touching the database, as you scale them you often have to have multiple databases to keep up with the website. I don't think I'm giving away anything shocking to reveal that, say, eBay does not keep all of their transactional data in one database. Instead they have multiple parallel databases and requests are sent to the correct one. Furthermore each database is replicated multiple times, and some queries are directed to read-only copies while others are directed to the copies you can write to. They do this because there is simply no way with current hardware to scale to their kind of site. (If you scale further, a little company called Google found that databases couldn't do what they wanted to do. Period.) So not only does that application assume it is the only thing that will touch the database, but the application stress is such that the database had to undergo a massive redesign in order to scale. And now nothing but that application can be expected to be able to figure out what data is where. Incidentally I suspect (though admittedly I don't know for sure) that eBay does not have any single database which has all of the data from the company. Just keeping such a database up to date would be a nightmare. Rent.com is smaller - a lot smaller. But we're big enough and growing fast enough that we are starting down the same path already. Now our backgrounds are relevant here. Yes, in an ideal world arguments should just stand on their own. But sometimes whether or not an argument makes sense depends on whether you have practical experience in a case where it doesn't. I happen to have that experience. I strongly suspect that you don't. Which is why in addition to pointing out why your assumptions may break in a particular environment, I also pointed out that my experience and role in that environment gives me a lot of practical experience to speak from. You can choose to disregard that experience, but I think it is relevant to the point at hand. Incidentally this is not the only environment I know of where applications can legitimately assume they are the only ones accessing data. There are lots of others that I know of, but I just don't have direct experience with them. To name just one, many types of data processing businesses regularly manipulate large amounts of customer data. By contract they need to guarantee that this customer data doesn't go anywhere the customer didn't ask for it to go. This legal requirement is much easier to meet if they make sure that that data stays on its own databases in a locked down environment where only the custom application for that customer can touch it... And a final note. My comment about your cheap shot is based on this paragraph: No web programmer worth his salt would advocate the mixing of Perl and HTML. Why advocate the mixing of Perl and SQL. You make it sound as if I was advocating the mixing of Perl and SQL. That I wasn't doing, and it seemed to me to be a cheap shot to dismiss what I had to say in such a way. Upon reflection I realized that I should have given you the benefit of the doubt - you might not have understood that I wasn't advocating that. But hopefully if you didn't then, you do now.	[reply]

Replies are listed 'Best First'.

Re^5: Moving SQL from perl to Stored Procedures
by tilly (Archbishop) on Sep 12, 2006 at 19:29 UTC

a good working understanding

After that if you need to optimize and don't know enough to do it, then you can bring in the expert. If you have a good database and have shown basic common sense, this is generally a fairly straightforward process.

That's true in general. But it is even more true with databases. Because a good DBA working with a semi-decent database doesn't even need to see your code to find the bad queries! How do they accomplish this miracle? Why quite simply, the database has profiling tools to tell the DBA where it is spending its time, and bad queries will show up as one of the top running statements. Now that the DBA is armed with the actual query that is misbehaving, it is a fairly simple matter for them to come to the application developers and say, "Find me where you are generating this query." A few invocations of grep later, and you're in a position to start analyzing it.

The Pareto principle tells you that you only need to do a little bit of this optimization to clear out the big offenders. Practice bears this theory out.

Incidentally I've found that doing this kind of optimization requires application programmers and DBAs to cooperate. DBAs and application programmers each bring something to the table. A DBA might notice, for instance, that a better index would help. Alternately the application programmer might add a caching layer to reduce how often they are querying the database. Optimization is an argument for cooperation, not an argument for making one side or the other responsible for the problem.

lot

Don't believe me? Look at the popularity of MS Access as a development platform! Just because something is trivial doesn't mean that it is unimportant. (History note. Back in the 90's, O'Reilly had a frustrating problem. Everyone was producing bad CGI books and making a fortune. O'Reilly had all of the top Perl expertise, and nobody wanted to do a CGI book! O'Reilly knew it would sell well, but to all of the good Perl programmers, the web was "trivial". Take in parameters, spit out text. Yeah, as a programming exercise this is trivial, but it is still important...)

But your claim that only trivial applications can assume they are the only ones accessing the database is wrong in more ways than that. Because I only named RT as an example that you are likely to be familiar with. There are lots of other reasons why only the application will access the database.

Another example is a high performance website. A high performance website, by nature of what it does, stresses databases to their limits. In order for it not to fall over, you wind up using every trick you can to make things go smoothly. You said, DB design transcends applications for all but the most trivial. Well then, high performance websites are among the "most trivial" applications, because any serious mismatch between the application design and the database layout will cause a ton of extra load that will be a huge problem.

In fact not only do high performance websites routinely assume they are the only ones touching the database, as you scale them you often have to have multiple databases to keep up with the website. I don't think I'm giving away anything shocking to reveal that, say, eBay does not keep all of their transactional data in one database. Instead they have multiple parallel databases and requests are sent to the correct one. Furthermore each database is replicated multiple times, and some queries are directed to read-only copies while others are directed to the copies you can write to. They do this because there is simply no way with current hardware to scale to their kind of site. (If you scale further, a little company called Google found that databases couldn't do what they wanted to do. Period.)

So not only does that application assume it is the only thing that will touch the database, but the application stress is such that the database had to undergo a massive redesign in order to scale. And now nothing but that application can be expected to be able to figure out what data is where.

Incidentally I suspect (though admittedly I don't know for sure) that eBay does not have any single database which has all of the data from the company. Just keeping such a database up to date would be a nightmare.

Rent.com is smaller - a lot smaller. But we're big enough and growing fast enough that we are starting down the same path already.

Now our backgrounds are relevant here. Yes, in an ideal world arguments should just stand on their own. But sometimes whether or not an argument makes sense depends on whether you have practical experience in a case where it doesn't. I happen to have that experience. I strongly suspect that you don't. Which is why in addition to pointing out why your assumptions may break in a particular environment, I also pointed out that my experience and role in that environment gives me a lot of practical experience to speak from. You can choose to disregard that experience, but I think it is relevant to the point at hand.

Incidentally this is not the only environment I know of where applications can legitimately assume they are the only ones accessing data. There are lots of others that I know of, but I just don't have direct experience with them. To name just one, many types of data processing businesses regularly manipulate large amounts of customer data. By contract they need to guarantee that this customer data doesn't go anywhere the customer didn't ask for it to go. This legal requirement is much easier to meet if they make sure that that data stays on its own databases in a locked down environment where only the custom application for that customer can touch it...

No web programmer worth his salt would advocate the mixing of Perl and HTML. Why advocate the mixing of Perl and SQL.

[reply]