clinton has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to figure out the best way to map my objects to the database. I started with this...

Take two objects: user and group. They have certain properties in common (eg object ID, parent ID, creation date, status, last modified date), and certain properties unique to each object (eg Group : name, description, User : firstname, surname, email address). So it made sense to have the following:

Table:User Table:Object Table:Group ---------- ------------ ----------- object_id <==> object_id <==> object_id email object_type name firstname parent_ID description surname created last_modified

But then we have an object (announcement) which can be subclassed, so there could be an announcement which is a death_notice, and an announcement which is a wedding.

A death notice only requires one firstname/surname, while a wedding has two. Also a death notice might have different fields (eg charities (as in 'please donate to...') while a wedding may have gift_lists.

Also, while you know that you want to search for a user or a group, you may not know whether you should be searching for an announcement under death notices, in_memoriam, obituaries etc. So there is significant overlap between these object sub-types.

So what are the approaches:

1) Table per announcement type:

Table:Death Table:Wedding ----------- ------------- object_id object_id firstname firstname_1 surname surname_1 short_obit firstname_2 obituary surname_2 flowers message_from_couple charities details date_of_death date_of_wedding

2) One announcements table plus separate tables for extra attributes:

Table:Announcement Table:Death Table:Wedding ------------------ ----------- ------------- object_id object_id object_id firstname short_obit firstname_2 surname obituary surname_2 flowers message_from_couple charitites details date_of_death date_of_wedding

3) One table with redundant columns:

Table:Announcement ------------------ object_id notice_type firstname_1 surname_2 firstname_2 surname_2 summary details date_1 field_1 field_2 field_3

...plus syntactic sugar in the classes to map field_1 to flowers, and date_1 to date_of_death.

This has the advantage of easy searching across these objects, plus simplified storage and only a single join between Table:Object and Table:Announcement, but at the expense of redundant fields.

4) One table for all property values

By this, I mean a completely different style of storage where you have these tables:

Table:Object Table:Properties Table:Object_Properties ------------ ---------------- ----------------------- object_id property_id object_id object_type property_name (eg flowers) property_id ... property_type (eg text) value ... (or) text_value blob_value date_value boolean_value etc

This has the advantage of being completely extensible, easy to search across objects, but is slower to construct each object (although I'd cache the constructed objects) and multi-field searches would involve many more table joins. Also, it is more difficult to conceptualise and is a pain to work with. And I have a feeling that it would scale poorly with the growing number of properties.

I am leaning towards approach number (3) as (in this case) the number of fields that are not in common are similar in nature and so won't be overly redundant, but is this the best approach? At what point (how many fields) does it become a bad approach, and which would be the upgrade choice?

I would appreciate any feedback from the experiences that you've had. I realise that this has nothing to do with Perl, but it is a design issue that most of us have had to deal with at one time or another.

many thanks

Clint

Replies are listed 'Best First'.
Re: Mapping objects to database tables
by derby (Abbot) on Nov 07, 2006 at 13:30 UTC

    Maybe it's just me but I would design the db first and the object layer second - that way external tools (sql, report builders, etc) do not have to filter through your mapping.

    So for your example, I would modify example 1 ... but add a third table called Person and pull the person data out of Death and Wedding:

    Table:Death Table:Wedding Table::Person ----------- ------------- ------------- death_id wedding_id person_id person_id groom_id firstname short_obit bride_id surname obituary message_from_couple flowers details charities date_of_wedding date_of_death
    Where groom_id and bride_id are foreign keys into Table:Person

    -derby
      That makes a certain amount of sense, but doesn't give me the whole solution. I have left the firstname/surname as ordinary text fields because they have no more relevance than as text. There could be a thousand John Smiths who have nothing to join them, and there is no way of knowing whether John Smith 1 is the same as John Smith 2.

      Also, that change doesn't help me as far as specifying a list of properties particular to each announcement sub-class.

        Okay ... for your problem domain, pull them back into the Death and Wedding table (but name them something better than xxx_1 and xxx_2 --- groom_first and bride_first may become un-pc but I'd worry about the downstream maintenance dev finding my home address than a PC fascist).

        As for the abstraction to announcements - I'd model the domain before I'd model a model of the domain. Once you have a handle on the domain, then I'd look for further abstractions based on biz requirements. I've been burned way too many times by meta systems that make adding a new type fairly easy but asking a simple question like "how many deaths notices did we have for November" caused a monster headache of building a slew of objects and then asking each object are you of type "death" and did you occur in "november." While a straight sql to answer that question would take milliseconds - going through the object hierachy takes hours.

        -derby
        ad John Smith's uniqueness:
        I think that derby's data structure exactly reflects described situation. Once you have two or more J.S. _without_ some related rows in other tables, you really cannot distinguish between them. But it is not disadvantage of derby's normalization, it is lack of informations. You can, for instance, add some external (real) unique identifier into database table Person (insurance number? passport number? e.t.c.)

        ad other:
        I don't understand exactly what do you need. But I think that you can create some database views or some more complicated (perl) structures representing derby's data structure in proper format...

Re: Mapping objects to database tables
by perrin (Chancellor) on Nov 07, 2006 at 15:23 UTC
    This is a very old question, discussed at some length by Martin Fowler in his book "Patterns of Enterprise Architecture." For a brief practical discussion, take a look at the Tangram docs for mapping inheritance. Tangram is the only Perl O/R mapper to handle this, AFAIK.
      Thanks Perrin

      That link to Tanagram::Relational::Mappings was very useful indeed - providing a list of the pros and cons of the models I described above. Thanks for that.

      The Martin Fowler book Patterns of Enterprise Application Architecture looks interesting - think I'll give it a read. As somebody with almost zero Java experience, would it be impenetrable or reasonably easy to map to Perl?

        The text of the book is pretty good, but the actual code examples will be hard to follow if you've never used Java. You can take a look at some sample writing from the book on Fowler's website.
Re: Mapping objects to database tables
by merlyn (Sage) on Nov 07, 2006 at 13:21 UTC
      But this isn't about redundant data. There is no repeated data in any of these designs - they are all normalised.

      I am going to read the article, but, unless I'm mistaken (which is entirely likely), this isn't it.

        It's not normalized ... people can die shortly after being married. I could insert something funny here but my wife may be lurking :-)

        -derby