in reply to Normalisation Question (OT but trust me, perl implementation)

Thanks all of you, you've given me a lot to think about. I really appreciate it.

Remember all this when I come back and ask you for the query to make the join to grab all this lovely data back out again, right?



($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print

Replies are listed 'Best First'.
2Re: Normalisation Question (OT but trust me, perl implementation)
by jeffa (Bishop) on Aug 04, 2003 at 13:39 UTC
    Well ... since you asked. :P Let's use Abigail-II's suggestion as a model - i took liberty and made some changes though:
    mysql> select * from people;
    +----+-------+-------+-------+-------+
    | id | first | last  | birth | death |
    +----+-------+-------+-------+-------+
    |  1 | John  | Smith | NULL  | NULL  |
    |  4 | Sally | Smith | NULL  | NULL  |
    +----+-------+-------+-------+-------+
    
    mysql> select * from characters;
    +----+----------+---------+
    | id | first    | last    |
    +----+----------+---------+
    |  1 | Jake     | Jones   |
    |  2 | EvilJake | Jones   |
    |  3 | Shelley  | Sellers |
    +----+----------+---------+
    
    mysql> select * from people_characters;
    +--------+------------+
    | people | characters |
    +--------+------------+
    |      1 |          1 |
    |      1 |          2 |
    |      4 |          3 |
    +--------+------------+
    
    So, if we want SQL to fetch all characters played by John Smith, we need to join characters to people via the people_characters table:
    my $sth = $dbh->prepare(' SELECT characters.first, characters.last FROM characters INNER JOIN people_characters ON characters.id = people_characters.characters INNER JOIN people ON people_characters.people = people.id WHERE people.first = ? AND people.last = ? ORDER by characters.last, characters.first '); $sth->execute(qw(John Smith)); while (my $row = $sth->fetchrow_hashref) { print "$row->{first} $row->{last}\n"; }
    Now, if these tables were set up properly with Class::DBI, fetching the characters of a given person should be as easy as:
    use MyDBI::people; my ($person) = MyDBI::people->search(first=>'John', last=>'Smith'); my @characters = $person->characters;
    I know that's a lot to digest (and wait until you need to add and delete people and/or characters!!), but hopefully this will be useful to you when you need it. Good luck. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      There will always be some bizarre cases that don't fit a simple schema, but this is a good start. I also don't like tables which require two words for their name: I prefer a schema that conforms to one table, one noun.

      Here are some ideas to extend it:

      I would propose the term 'player' for anyone or anything that could act out a performance. Thus s/people/players/. Even inanimate objects and other non-people may be players; think "Cujo" or "Otto Pilot as himself" for easy examples.

      I would propose a table 'episodes' to capture multiple titles that are intended to be seen as continuations or illuminations on a single world. Thus, Star Wars has six canonical episodes, while Star Trek has a much larger number in its canon.

      There are many cases where a single character is portrayed by many different players, either as different versions of the character ("young Indiana Jones") or different components of the character ("voice of Tweaky"). I propose to refer to these, if you want to track them separately, as 'aspects' of a single character. The simplest and most common is when a single player plays 'all' aspects throughout the episode.

      A single player may perform more than one aspect and more than one character, too. Michael J Fox often plays at least two aspects of Marty McFly at once, as well as his daughter, his son, and his multiply-great-grandfather.

      I would propose that a 'role' is the conjunction of 'player' vs 'character' vs 'aspect' vs 'episode'. So s/people_characters/roles/. James Earl Jones was the voice of Darth Vader in Star Wars through episodes SW4, SW5 and SW6. That's three roles by my count.

      For more ideas, browse the IMDb.com site for a while and see if you can reverse engineer their tables.

      --
      [ e d @ h a l l e y . c c ]

        Thanks Jeffa, and thanks halley.

        Can I kind of abstract the question a bit, now that I've got two or three schemes which will all fit the purpose, not to mention the SELECT syntax as well?

        My original question is essentially "I know how to do two, but I don't know how to do greater than two".

        My knowledge of normalisation came from a good article on, I think, O'Reilly somewhere, which was based on musical performances. Rather than having a database which says "Paul McCartney played on 'Hey Jude'", "Paul McCartney played on 'Ebony and Ivory'", you have a table where 77 = Paul McCartney, 88='Hey Jude' and 99='Ebony and Ivory' and a "played_on" table which maps 77 to both 88 and 99.

        So that makes sense in terms of normalisation.

        But it's a binary system. There are only two states of Paul McCartney as regards played_on: he played on 'Hey Jude', but he didn't play on 'Smells Like Teen Spirit'.

        What I've got with this more complex data, however, is a more-than-binary system. It's the equivalent of "On 'Hey Jude' he played piano and bass and sang, but on 'I Saw Her Standing There' he played bass and sang, and on 'Helter Skelter' he only played bass".

        You can normalise that only as far as a three-column table, right?

        On song A, performer B performed function C;
        On song B, performer B performed function D;
        On song C, performer A performed function Z;

        The only way you can further normalise it would be to have a table of relationships:

        In song A, you find relationship K;
        In song B, you find relationship L;
        In song C, you find relationship M;

        And in your table of relationships, you find that relationship K == "performer B performing function C" and so on.

        At this point, we're moving into areas where we've normalised to the fullest extent, but the SELECT statements are becoming terribly complex -- at what point do I stop and say "normalised is good, but this level is requiring very complicated selects and adding to the number of database hits"?

        And is that last scheme actually more efficient, just because it's got down to two columns in each table? If I have a table of functions and a table of people and a table of songs and a three-column table with all the mappings, isn't that more efficient?

        P.S. It's Buffy, as people who know me have probably guessed. There's Buffy, there's Faith-In-Buffy's-Body, there's Sex-Toy-Robot-Buffy, there's The-First-Evil-Impersonating-Buffy, all played by SMG of course, but there's also Buffy-as-a-child, in a couple of shows, played by other people. And once, turned-into-a-rat-Buffy played by, well, a rat. And the person who needs most normalisation is the creator of Buffy, Joss, who has been writer, director, producer, actor and composer in his time.



        ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print