Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Right now, our code is gradually getting cleaner and we do have hints of separation here and there, but we have decided we need to focus on developing our model appropriately. However, rather than try and refactor the awful database design, we want to build a clean object layer (the model) on top, have tests for it, and then drive everything through that layer so that later we are in a better position to fix the database. That leads to an interesting design problem.

One primary design goal is, as much as possible, only expose object responsibilities, not data. We're considering either using a ORM or the phrasebook pattern. Because our database is designed so poorly, while I do like ORMs, I'm thinking that perhaps a phrasebook would be a better way to go. That being said, ORMs can save a lot of agony, if done properly (and I find 'em quicker to implement.)

With an ORM, it should be easy to add custom, hand-written SQL for the numerous cases where our bad database design creates an obstacle for an ORM. Having used Class::DBI in the past, it seems like a good fit for this. Not having used DBIx::Class, I can't comment on it, but at least one of our programmers has said his experience with it suggests that there's a large performance hit on the initial loading of large schemas. Because this code is running as CGI and not mod_perl (and is in no shape to be converted), we can't afford that performance hit. I also hear great things about Rose::DB::Object, but I also haven't played with that yet.

So given the desire to create an model of a poorly designed database and the need to add custom SQL in various bits of it (and in some places, even post-processing with Perl to get reasonable data), what approach would you take? ORM (which one), phrasebook, or something else altogether? I realize much of this is vague and I'll be researching myself, but suggestions welcome.

The database is MySQL 4.1.19.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Creating a model of a bad database
by perrin (Chancellor) on Dec 12, 2006 at 15:50 UTC

    I think it depends on how important it is to hide the real schema. The ORM tools all give you a lot of leeway in terms of how to fetch your data, including custom SQL. However, with Class::DBI and Rose::DB::Object, you will still end up with objects that map to individual tables. If you want to hide the fact that one of your important objects is actually composed of data from several tables, that could be bad.

    DBIx::Class tries a bit harder with this. It has a concept of result sets and they can be updated. What I don't know -- because I haven't used it -- is whether it can handle inserting and updating objects that correspond to a complex join of multiple tables, especially if the logic is so complicated it involves custom SQL and some perl processing. I think you'd better ask that on the DBIx::Class mailing list.

    Ultimately, there's a good chance you'll have to write your own application-specific mapper if you really want to hide all the database details. It's really hard to do with a generic tool. I've done that before, and I would think twice before doing it again, since it took a lot of work and ended up lacking some of the good features that are available from ORMs now.

Re: Creating a model of a bad database
by ph713 (Pilgrim) on Dec 12, 2006 at 17:09 UTC
    I would go the ORM route myself (but then again, I'm a pretty biased party in this discussion, as I work on DBIx::Class). You can still add custom methods and custom SQL as neccesary with DBIx::Class, and if you intend to slowly refactor the database back to sanity, the custom methods you've added to do funky things in your ORM classes serve as a sort of TODO list of things to fix.

    As to the startup overhead of DBIx::Class, that is a real concern at this point, especially if your code runs in short-lived processes. Your progammer could have been talking about either or both of the following startup performance hits:

    First, he could be talking about dynamic schema loading via DBIx::Class::Schema::Loader at app startup. This will always be slow, there's not too much we can do about that. On the other hand, dynamically loading the schema from the database via Loader is really only recommend for development work and/or trivial schemas. For a production application, nobody would recommend using Loader at runtime.

    You can either manually define the corresponding DBIx::Class schema, or you can use DBIx::Class::Schema::Loader in "one-shot" mode to generate the schema classes on disk each time you make schema modifications in development, and then use those static classes in production.

    The other startup-time performance concern is DBIx::Class's use of Class::C3 for sane multiple inheritance. Class::C3's initialization involves considerable overhead at startup time, and some minor performance overhead at runtime. This startup hit is considerably less than the Loader one above, but you'd still notice it in profiling, especially for short-lived processes.

    I'm in the process of working up a patch against bleadperl to put C3 method resolution (as an optional per-package pragma) into the perl core, which is about the only sane way to improve the situation given the deep interactions with perl's method caching. I can't really offer any guarantees as to when this will be completed and/or if p5p will accept it into the canonical perl core (hopefully in time for 5.10.x?), but I'm actively working towards these goals.

    -- Brandon

      I know DBIx::Class can handle fetching data with complex relationships. Can it handle inserts and updates that span multiple tables, without exposing the actual storage details to the user? What if they require custom SQL or custom perl processing?
        I'm guessing by that you mean insert/update to a database view, or something very like a view. DBIx::Class doesn't yet do that, and I don't know that it ever will. In PostgreSQL you can only do it by defining custom rules for the view AFAIK. I haven't had any personal experience with a database that does it with no extra work, if there are any. Topics like this have been discussed on irc.perl.org #dbix-class before, but implementing them in the generic case is tricky at best.

        Along these kinds of lines, the best we could offer at the moment is that you can make a ResultSource class for view just like you could for a table (if your database doesn't support real views, that can be hacked as well, by basically specifying the SQL that generates the view as the table name, IIRC), and then you could override the update and insert methods for this source and include your own custom code that breaks out the data and does the insert/update to the underlying tables (either using DBIC abstractions of those tables, or raw sql via a dbh accessor).

        I was thinking if it might be possible to handle inserts and updates to related tables with triggers. Since you'll have the columns of the other tables defined in this object anyway (in a TEMP column group), that should be relatively painless.
Re: Creating a model of a bad database
by Cabrion (Friar) on Dec 12, 2006 at 13:21 UTC
    I have the same issue with a set of sites I inherited, so I'm anxious to see what the other Monks have to say about this. One thing I have been contemplating is moving from MySQL to PostgreSQL, modeling the database correctly, then using updateable views to mimic the old database structure. I'm not suggesting that as a course of action for you, but wouldn't mind a few comments on this approach from other Monks. I'm also contemplating ORM as a solution.
      Cabrion:

      I've done the same technique a few times. I used Sybase for one, and MS SQL for another. It's a very nice technique. I also like to remove permissions from the underlying objects where possible. Then, every time I get to modify one of the client apps, I fix 'em, and try removing more permissions & such.

      The only problem with such an incremental approach is that you often don't finish. (I've got one in a rather odd state right now.) Then if you have to turn over the project to someone else, they may be confused by it. Hopefully, they have a well-developed sense of "taste" and will continue the migration towards a rational end.

      On my last project, I was the fourth person to work on it. You could plainly see that each one had their own philosophy. I didn't want a fifth philosophy in the code, so I just adopted the one that the majority of the code used. As for the database, it was just a bunch of tables. No relationships at all! Needless to say, I started tying things together in there. Another release or two, and at least the database will be in good shape. (Heaven help the code, though....)

      --roboticus

Re: Creating a model of a bad database
by Herkum (Parson) on Dec 12, 2006 at 19:02 UTC

    Personally, I would write a simple generic ORM layer and create aliases to the ORM layer.

    The originial ORM layer would be a hook to the table columns and rows. However, a badly designed and implemented database can make deciphering those Table and column names a headache. So you write aliases that make more sense, from a programming stand-point, instead of hoping people will know the database schema.

    Example, you have a table that looks like this,

    Table: Agent

    Columns
    name_part_1 - a person's first name
    name_part_2 - a person's last name
    name_part_3 - a person's middle name

    # Standard ORB print "First Name: " . $agent->name_part_1 . "\n"; print "Last Name: " . $agent->name_part_2 . "\n"; print "Middle Name: " . $agent->name_part_3 . "\n"; sub get_first_name { my $self = shift; return $self->name_part_1;} sub get_last_name { my $self = shift; return $self->name_part_2;} sub get_middle_name { my $self = shift; return $self->name_part_3;} # Using the Alias Layer print "First Name: " . $agent->get_first_name . "\n"; print "Last Name: " . $agent->get_last_name . "\n"; print "Middle Name: " . $agent->get_middle_name . "\n";

    While an alias layer is not as good a refactoring, it could make your database appear to be refactored.

Re: Creating a model of a bad database
by siracusa (Friar) on Dec 13, 2006 at 20:04 UTC

    First let me say that I'd hesitate to use any ORM in a plain CGI environment where the entirety of the Perl code has to be loaded into memory on every request, and is then discarded at the end. (And forget about dynamically loading table information from the database, since that'd have to be done on every request.) In such an environment, startup time is critical. Adding more, bigger Perl modules to load is not a good thing.

    But perhaps there are other, larger performance issues in your case and you think you can afford to load another few thousand lines of Perl code on every request. If that's the case, then an ORM can help you, but it will probably not get you all the way to your goal of "refactor[ing] the awful database design, [and] building a clean object layer (the model) on top."

    As has been noted, it's not retrieving data that's the problem. Modern Perl ORMs can fetch data from multiple tables simultaneously using JOINs and the like. The problem is going in the other direction: insert, update, and delete operations on a single Perl object mapping to an arbitrarily complex series of inserts, updates, and deletes in the database across many tables.

    If your database has very flexible updatable views, and you can get by just using those, then maybe you can point your ORM of choice at the views and be all set. But most databases have severe limitations on updatable views, so this solution probably won't get you all of the way.

    The solution that I think has the most promise is this:

    1. Pick an ORM to front all of your existing, poorly factored tables.
    2. Write another layer of your own code to provide the "real" interface that you will use. This layer should use the ORM to do its dirty work, but need not have any relation to the ORM API (but see below).

    I think that's the best way to really isolate the ugly db design from your "end user" code. You still get the benefits of the ORM in that you don't have to hand-write SQL or deal with the vagaries of column data formatting and so on, but you maintain complete independence of the API and the db design. As you refactor the database, you should eventually be able to make this top layer API a trivial/automated wrapper for the underlying ORM (as the tables begin to match up better with the idealized top-level classes).

Re: Creating a model of a bad database
by EdwardG (Vicar) on Dec 13, 2006 at 13:51 UTC

    Hi Ovid

    You don't give an example of the awful database design, but assuming its similar to some of my experiences I would offer these few words of caution;

    Difficulties arising from ambiguities in the database design should not be underestimated. Columns (attributes) that are semantically overloaded or poorly defined will continue being problematic regardless of abstraction.

    Lack of normalisation, and the accompanying risk of data corruption, is easy to overlook when building an object-relational mapping or a phrasebook. Can you afford to aim for 3NF? (Can you afford to not?)

    And finally some rhetoric which may or may not be helpful; Poor foundations will sabotage anything built on top.

    Cheers, and good luck!