Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

SQL Joins across Subclasses

by djantzen (Priest)
on Jan 11, 2003 at 11:04 UTC ( [id://226068]=perlquestion: print w/replies, xml ) Need Help??

djantzen has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently sketching out an object model where I have a superclass that holds about five attributes shared among the subclasses, with a couple of attributes in each subclass that are unique to it, and I'm trying to decide on a good relational table layout for the classes. In the past I've sometimes found it more worthwhile to have a single table that all the classes shared, just so long as their differences where minor and their unique attributes were not mutually exclusive. In other cases, where there was more difference than commonality, I've had entirely separate tables for each subclass. In this case though, I've got the bulk of the attributes shared and implemented in the superclass, with a minority of class-specific, and mutually exclusive attributes in the subclasses.

Asset attributes: asset_id desk status priority deadline / \ DocumentAsset attributes: MediaAsset attributes: doc_id media_id label
Note: a concrete subclass will have both an asset_id and either a doc_id/label or media_id.

My question is whether it makes sense to break this out into three tables and use joins to assemble the data into an object instance, or combine them into a single table where the doc_id, media_id, and label fields can be NULL (since you can have at most two of the three at any time)? Perhaps set up constraints that prohibit any one row from having values set for both doc_id and media_id?

Thanks for your input, fever

Replies are listed 'Best First'.
•Re: SQL Joins across Subclasses
by merlyn (Sage) on Jan 11, 2003 at 16:44 UTC
    If you use PostgreSQL, you can declare the derived table types as truly derived. Then a query against the parent type by default does an implicit join over all subtypes (unless you add an ONLY keyword). It's pretty slick.

    If you have a choice of databases, choose PostgreSQL. Particularly if you are using something like MySQL only out of inertia.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: SQL Joins across Subclasses
by pg (Canon) on Jan 11, 2003 at 16:40 UTC
    There are different ways to look at this.

    From a database design view:
    1. If there is no strong reason, I would say always normalize your tables. Otherwise, you may not see a problem now, but big problems in the future. Believe me. The maintainance guys will blame you for the bad design.
    2. However if what you are working on is some sort of data warehouse thing, and those tables have millions of rows. Don't hesitate to denormalize your database. In this case, nobody can blame you, you are not a black sheep. Denormalization is one of the conventional ways used in data warehouse to improve performance. This is on paper.
    From a OO-design view.
    1. You better have three classes defined. This is not related to the background database design, even if you decide to denormalize your tables.

      For example you denormalize those three tables in your post, and only have one asset table, you still should have three classes: asset, media, and document asset. There is no reason to break the OO rules.

      Also you will use lots more memory space, if you combine three objects into one, when you design your classes. There would be space occupied by those NULLs, BLANKs, ZEROes etc.
    As a summary, don't denormalize your tables, unless you have a very good reason. If you have a very good reason, denormalization is definitely not a crime, and there are design methodologies to support you.

    For your classes, always follow the rules.

    Just want to add one more thing:

    For database performance, join is usually not a big performance issue, although it will take some extra time. Because most of the time, join is an indexed access, not a full table scan, as long as your database is reasonably well designed, and your sql statement is carefully tuned.

      Blazing performance in this case is not the primary objective, rather it is maintainability and extensibility. So given that, I'm inclined to take your advice to normalize and use joins.

      As far as the object model goes, you're preaching to choir my friend :^) Even if I were to combine the classes into a single Asset, the differences would be minor enough to incur almost no performance degradation over the model with inheritance. However, as a matter of good OO design the latter clearly wins out.

      Update: In my object model I've decided to solve this dilemma by separating out the database interactivity according to which attributes belong to each class. In other words, Asset is responsible for saving only its attributes in its own table, and DocumentAsset's database methods rely on SUPER to inherit the necessary behavior. Thus, DocumentAsset::save calls $this->SUPER::save() , and then proceeds to save only its distinct attributes in its own table, namely, docid and label. So far so good :^)

      thanks pg.

Re: SQL Joins across Subclasses
by tall_man (Parson) on Jan 11, 2003 at 15:17 UTC
    This sounds like a case of "denormalization for performance," which is considered a bad idea by relational database experts; so are NULL fields. I think they are right, especially in a case like yours where your tables are so small and simple. Joining them is not a big deal, and data integrity will be easier to maintain.

    Here is a site that may help: Database Debunkings

      Actually, it sounds like a case of what CJ Date calls the "First Great Blunder", which is equating object classes with tables (as opposed to domains/data types).

      Date would probably suggest using three tables (common attributes in one, unique attributes in two others) and views to solve the problem that was originally posted. Unfortunately, no RDBMS today supports proper (always updateable) views, AFAIK.

        What do you mean by "equate", and why is that a blunder? Our design for this object-oriented API is generally to allocate a separate table to each class that requires persistence. More specifically, if a class has a one-to-one relation with a particular attribute, that attribute gets a place in the table for the class. If there is a one-to-many relation then we have a separate table for the attributes, keyed to and constrained by a unique id in the class's primary (parent) table. If this is what you're suggesting is a blunder, I'm wondering what terrible fate should have befallen us in the previous four years of development :^)

        As to the rest of your comment, I'm leaning toward the three table layout, although since the subclasses in my object model will require all of the contents in Asset and all of the contents in either DocumentAsset or MediaAsset, a join I think is more appropriate than a view. I think of views as more for taking a subset of data from several tables, but this is really a combination of everything keyed to a unique id.

Re: SQL Joins across Subclasses
by Anonymous Monk on Jan 11, 2003 at 15:06 UTC
    If you suspect that further attributes might get added to the subclasses, you should break them out.

    Otherwise I'd keep them in one table, with a 'type' field that specifies the subclass name. When you fetch the row you can bless it into the appropriate subclass. Then you'll only have accessors to the relevant columns, as defined by the subclasses. All in MVHO.

Re: SQL Joins across Subclasses
by Nitrox (Chaplain) on Jan 11, 2003 at 14:58 UTC
    I've always been partial to allowing the DB to do the grunt of the work, joins would be fine in my opinion.

    -Nitrox

      If you are using PostgreSQL then there's an explicit syntax to enable this sort of thing. PostgreSQL will then allow you to query on the assets table and see all the "objects" that are Assets. If you want to see just the DocumentAsset objects then use document_asset table. So far this schema by default won't create inconsistant Asset objects &em; the asset_id is controlled by a single sequence so if that's enough assurance you can proceed and know that asset_id is unique without explicitly constraining it.

      CREATE SEQUENCE asset_seq; CREATE TABLE assets ( asset_id integer not null default nextval('asset_seq'), desk text not null, status text not null, priority text not null, deadline date not null ); CREATE TABLE documentasset ( doc_id integer not null unique, label text not null ) INHERITS (assets); CREATE TABLE mediaasset ( media_id integer not null unique ) INHERITS (assets);

      If you really have to ensure that asset_id is unique then you have to step outside of SQL. PostgreSQL has these nice inheritance features but doesn't yet allow you to constrain them declaratively. If you add on this PL/pgSQL code then it will constrain the inheritance for you. All of thsi is obviously just a start but it should get you thinking of the possibilities.

      /* From your command prompt type createlang plpgsql <database name> It generates the following code automatically so it isn't nessessary to remember this next bit */ CREATE FUNCTION "plpgsql_call_handler" () RETURNS opaque AS '/usr/local/lib/plpgsql.so', 'plpgsql_call_handler' LANGUAGE 'C'; CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql' HANDLER "plpgsql_call_handler" LANCOMPILER 'PL/pgSQL';

      And now to create a PL/pgSQL function and hook it into the right place. Please be aware I just wrote this into a window and didn't try to run it first or even verify that it's syntactically correct. You'll want to read up on PostgreSQL's triggers and look into either implementing this sort of function in PL/pgSQL or maybe it's TCL or Perl implementations.

      CREATE FUNCTION asset_id_ins_upd () RETURNS OPAQUE AS ' DECLARE found INTEGER; BEGIN IF TG_OP = ''UPDATE'' THEN -- Return OK for unchanging asset_id on -- update operations IF NEW.asset_id = OLD.asset_id THEN RETURN NEW; END IF; END IF; SELECT 1 INTO found FROM assets WHERE assets.asset_id = NEW.asset_id; IF FOUND THEN RAISE EXCEPTION END IF; END; ' LANGUAGE 'plpgsql' WITH (isstrict); CREATE TRIGGER assets_asset_id_ins_upd BEFORE INSERT OR UPDATE ON ass +ets FOR EACH ROW EXECUTE PROCEDURE asset_id_ins_upd();

      Fun Fun Fun in the Fluffy Chair

        INHERITS, yes! But unfortunately, I'm not using Postgres. I've got Oracle 8i, and after searching their docs for "inherit", "derive", "subclass", "under" and "child" I can't find an equivalent keyword. All in all that's not a huge loss though, as we've tried to stay DB agnostic in our SQL to as great a degree as possible.

        What are your thoughts on the principle of the matter as far as using joins here though? pg points out below that denormalization for performance is an accepted practice, but this application isn't really going to be hit that hard. Rather I'm concerned more with ease of maintenance and extensibility, working on the assumption that the child tables will expand down the road. Which, upon reflection, favors joins strongly, views a little less so IMO. Hmmm. /me dreams of INHERITS...

Re: SQL Joins across Subclasses
by hardburn (Abbot) on Jan 13, 2003 at 16:23 UTC

    In general, I think object oreinted databases are more about buzzword compliance than really useful. However, I have found problems that seem to lend themselves more naturally to an OO data structure than a traditional relational structure. This seems to be one such case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://226068]
Approved by valdez
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-18 19:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found