Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Guys and gals, I am working on a Perl/MySql project at the moment, and there are some database design consideration/heated argument going on. I would like to hear from monks, and also it would be great if monks can point me to some "more authoritative/official" sources on those particular (database design) issues - you know how some time you need those things to convince people:

Issue 1: I have a table that stores all different types of items (ship case, supplier case, pallet etc.) This is one common table. Besides the fact that different types of items share common attributes/columns, obviously some columns are only applicable to particular types. The current design emphasizes the commonality, and if a particular column does not apply to a particular row, that column is simply set to NULL for that row.

Now some people on the team want to split this common table into several tables, and one for each type of item. To me that is absolutely unnecessary. Use a more common analogy: In a company there are managers and other employees, those two types of people share some common attributes, but there are also attributes that apply to one type but not the other. When we deign database, is there a need to use two tables - a MANAGER table and a NON_MANAGER table, or simply use one - EMPLOYEE table. Obviously some columns will be NULL from time to time (that's why we have NULL, otherwise every time one sees a NULL, a new table should be created to avoid NULL). Some people say that breaks NF, but after seriously checked all 5 NF, the one table solution breaks absolutely none NF, although it might waste a bit of space.

Issue 2. Generalization. Something slightly different from above. Say we have a person table (T1), a project table (T2), and bunch of other tables for different entities/things: T3, T4... Tn. We have the need to attach documents to those things - being a person, or a project, or whatever. Say all attachment are store in a table called T'. Now we also need bunch of relational tables to link attachments to entities - the link between person and his attachment/document will be stored in table T1_T', and the link between a project and its documents will be stored in table T2_T', this goes on and on.

I am trying to simplify the data model, and would like to create a higher-level table called ENTITY, which stores nothing but all the identifiers: all employee ID, all project ID, this ID, that ID... Obviously person table will have foreign key referencing ENTITY table, and project table will have the same foreign key, as well as all other tables. Now the link to attachment is really simple, instead of having n relational tables, I only need one: the one between ENTITY table and attachment table.

I believe my design is correct in both cases, and I also believe those are common issues to lots of monks. But other than knowing what is right, I also need to show evidence that what I am doing is common practice and there are theoretical papers/books/TERMINOLOGIES/web sites behind, otherwise the discussion can go on for quite a while...

Thanks in advance. Many thanks!

  • Comment on (OT) Couple of Data Model Design Considerations

Replies are listed 'Best First'.
Re: (OT) Couple of Data Model Design Considerations
by aquarium (Curate) on Aug 14, 2007 at 00:32 UTC
    your questions are totally unrelated to perl...at this stage
    anyway...you should probably have a better grounding in relational database design before starting heated arguments. please refer to http://en.wikipedia.org/wiki/Database_normalization as a starting point for database design. There are more database design related concepts to consider than there is space/time to layout here but, understanding and designing a database to be in 3rd normal form caters for most needs
    i suspect that you will just try to justify your existing design...instead once you fully grasp 3rd normal form (doesn't take a year) you should have a go at drawing an Entity Relationship Diagram (ERD) from scratch.
    as for authoritative books...Structured Systems Analysis and Design by Edward Yourdon is still the baseline that all other methodologies are judged by. it encompasses data normalization and many other rigorous proven methodologies.
    best of luck
    the hardest line to type correctly is: stty erase ^H
Re: (OT) Couple of Data Model Design Considerations
by herveus (Prior) on Aug 14, 2007 at 11:50 UTC
    Howdy!

    When you say that your design is "correct", what do you mean?

    A design might be "correct" but violate first normal form. Conversely, it might be fully normalized but still fail to model the data accurately.

    The "right" approach to your first issue depends on several things. Are there attributes that are rarely used? They might be candidates for placing in one or more side tables. How many attributes do you have?

    Will the data be updated frequently ("transaction processing") or will it be mostly queried ("data warehouse") or will the usage be somewhere in the middle?

    I would first model the data using only as many entities as necessary. Once you have a clean and clear model that describes the entities and their relationships, you can consider implementation details. Don't get too wrapped up in "optimization", as you may well not know clearly which characteristics you need to optimize for. There is not necessarily a single "correct" solution.

    yours,
    Michael Database designer and modeler
Re: (OT) Couple of Data Model Design Considerations
by tinita (Parson) on Aug 14, 2007 at 14:16 UTC
    another point: if there will be many searches on the database and you're worried about performance (normalization can lead to worse performance) please consider using a search engine.

    update: so what about the suggestion to use a search engine is so bad that i get 2 downvotes and no reasons told?

    update2: in case anybody else got this wrong - normalization *can* lead to worse performance for *searches* and i'm not saying don't normalize.

    update 3: "if there will be many searches" doesn't mean that this will influence the speed. it means if there are many searches then it is worthwhile to use a search engine. if you have 3 or 4 damn slow searches a day is not relevant.

    rephrased (i thought it was clear what i meant, but apparently this wasn't the case): if you have a database that needs to be normalized but you have many search queries on it that are actually faster than they would be with a normalized data structure then don't hesitate to normalize but use a search engine for the searches.

      Howdy!

      This is bad advice, taken on its own.

      Denormalization is a species of optimization.

      Don't blithely do or recommend denormalization without specific data supporting the actual, documentable performance issue. If you don't know the actual issue, you might well simply make things worse.

      yours,
      Michael
        This is bad advice, taken on its own.

        Denormalization is a species of optimization.
        who recommended denormalization?
        i said, *if* you tend to not normalize because of performance issues, use a search engine instead.
Re: (OT) Couple of Data Model Design Considerations
by jhourcle (Prior) on Aug 14, 2007 at 14:58 UTC

    Now some people on the team want to split this common table into several tables, and one for each type of item. To me that is absolutely unnecessary

    ...

    I believe my design is correct in both cases, and I also believe those are common issues to lots of monks. But other than knowing what is right, I also need to show evidence that what I am doing is common practice and there are theoretical papers/books/TERMINOLOGIES/web sites behind, otherwise the discussion can go on for quite a while...

    So um ... you've decided that your way is best, but you have absolutely no proof of it, so you want someone else to give you evidence of why your way is better?

    I'm not sure if this is just complete trolling, or if you're serious. If you're serious, I suggest you give up, and listen to the others on the project. Your solution may functionally work, but it's not going to scale well. I suggest that you come up with the questions that you need to ask of the system, and then both you and the other person come up with solutions -- the other person's will likely be simpler, run faster, and can be more efficiently tuned (less overhead from useless indexing, more compact tables, etc.)

    As for references, the best one I have is the Oracle Data Modeling guide, but I don't think it's available unless you take the class.

Re: (OT) Couple of Data Model Design Considerations
by moritz (Cardinal) on Aug 15, 2007 at 20:28 UTC
    Regarding your Issue 1, I'd do the same that is recommended for any type of programming: factor out the things that are common, and store the rest separately.

    For your MANAGER and NON_MANAGER example that would be a table EMPLOYEE or PERSON that stores all common data (like address, payment etc) and a MANAGER-table for all data that only appears for managers.