in reply to 2Re: Normalisation Question (OT but trust me, perl implementation)
in thread Normalisation Question (OT but trust me, perl implementation)

There will always be some bizarre cases that don't fit a simple schema, but this is a good start. I also don't like tables which require two words for their name: I prefer a schema that conforms to one table, one noun.

Here are some ideas to extend it:

I would propose the term 'player' for anyone or anything that could act out a performance. Thus s/people/players/. Even inanimate objects and other non-people may be players; think "Cujo" or "Otto Pilot as himself" for easy examples.

I would propose a table 'episodes' to capture multiple titles that are intended to be seen as continuations or illuminations on a single world. Thus, Star Wars has six canonical episodes, while Star Trek has a much larger number in its canon.

There are many cases where a single character is portrayed by many different players, either as different versions of the character ("young Indiana Jones") or different components of the character ("voice of Tweaky"). I propose to refer to these, if you want to track them separately, as 'aspects' of a single character. The simplest and most common is when a single player plays 'all' aspects throughout the episode.

A single player may perform more than one aspect and more than one character, too. Michael J Fox often plays at least two aspects of Marty McFly at once, as well as his daughter, his son, and his multiply-great-grandfather.

I would propose that a 'role' is the conjunction of 'player' vs 'character' vs 'aspect' vs 'episode'. So s/people_characters/roles/. James Earl Jones was the voice of Darth Vader in Star Wars through episodes SW4, SW5 and SW6. That's three roles by my count.

For more ideas, browse the IMDb.com site for a while and see if you can reverse engineer their tables.

--
[ e d @ h a l l e y . c c ]

  • Comment on Re: 2Re: Normalisation Question (OT but trust me, perl implementation)

Replies are listed 'Best First'.
Re: Re: 2Re: Normalisation Question (OT but trust me, perl implementation)
by Cody Pendant (Prior) on Aug 04, 2003 at 22:58 UTC
    Thanks Jeffa, and thanks halley.

    Can I kind of abstract the question a bit, now that I've got two or three schemes which will all fit the purpose, not to mention the SELECT syntax as well?

    My original question is essentially "I know how to do two, but I don't know how to do greater than two".

    My knowledge of normalisation came from a good article on, I think, O'Reilly somewhere, which was based on musical performances. Rather than having a database which says "Paul McCartney played on 'Hey Jude'", "Paul McCartney played on 'Ebony and Ivory'", you have a table where 77 = Paul McCartney, 88='Hey Jude' and 99='Ebony and Ivory' and a "played_on" table which maps 77 to both 88 and 99.

    So that makes sense in terms of normalisation.

    But it's a binary system. There are only two states of Paul McCartney as regards played_on: he played on 'Hey Jude', but he didn't play on 'Smells Like Teen Spirit'.

    What I've got with this more complex data, however, is a more-than-binary system. It's the equivalent of "On 'Hey Jude' he played piano and bass and sang, but on 'I Saw Her Standing There' he played bass and sang, and on 'Helter Skelter' he only played bass".

    You can normalise that only as far as a three-column table, right?

    On song A, performer B performed function C;
    On song B, performer B performed function D;
    On song C, performer A performed function Z;

    The only way you can further normalise it would be to have a table of relationships:

    In song A, you find relationship K;
    In song B, you find relationship L;
    In song C, you find relationship M;

    And in your table of relationships, you find that relationship K == "performer B performing function C" and so on.

    At this point, we're moving into areas where we've normalised to the fullest extent, but the SELECT statements are becoming terribly complex -- at what point do I stop and say "normalised is good, but this level is requiring very complicated selects and adding to the number of database hits"?

    And is that last scheme actually more efficient, just because it's got down to two columns in each table? If I have a table of functions and a table of people and a table of songs and a three-column table with all the mappings, isn't that more efficient?

    P.S. It's Buffy, as people who know me have probably guessed. There's Buffy, there's Faith-In-Buffy's-Body, there's Sex-Toy-Robot-Buffy, there's The-First-Evil-Impersonating-Buffy, all played by SMG of course, but there's also Buffy-as-a-child, in a couple of shows, played by other people. And once, turned-into-a-rat-Buffy played by, well, a rat. And the person who needs most normalisation is the creator of Buffy, Joss, who has been writer, director, producer, actor and composer in his time.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print