Normalisation Question (OT but trust me, perl implementation)

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Normalisation Question (OT but trust me, perl implementation)
by Abigail-II (Bishop) on Aug 04, 2003 at 07:15 UTC

I'd make a second table 'characters', with character names, and a 'character_id' column.

Then I'd make a third table 'plays', with at least two columns 'person_id' and 'character_id'. One might also add a column 'episode_number'. This gives you the information about which actor played which role(s), and if you add the third column, also in which episodes. One could normalize away that third column, by making the 'plays' have three columns: 'plays_id', 'person_id' and 'character_id', and put 'plays_id' vs 'episode_number' in another table.

In yet another table, 'directors', I'd put people and episode_numbers together.

Abigail

[reply]

Re: Normalisation Question (OT but trust me, perl implementation)
by BrowserUk (Patriarch) on Aug 04, 2003 at 07:13 UTC

Something like

Read more... (986 Bytes)

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

[reply]
[d/l]

Re: Normalisation Question (OT but trust me, perl implementation)
by sgifford (Prior) on Aug 04, 2003 at 07:33 UTC

The point of normalization is to avoid duplicating information, to save storage and so that things can't get out of sync. For example, the simplest implementation would be to store <actor_name,role_name>, but if an actor changed their name (say an actress got married, or you realized you misspelled an actor's name), you might have to update hundreds of records, and if you missed some you'd have inconsistency.

There's lots of different ways to normalize, with each one making tradeoffs between convenience, speed, and data duplication.

In your case, I would probably have a PEOPLE table, an EPISODES table, and a ROLES table. I'd create "pseudo-roles" for the various non-acting roles that somebody can play, such as "director" or "writer", perhaps using a special character to indicate that. For example:

PEOPLE
  id|name
  --+----
   1|Carrie Fisher
   2|John Smith

EPISODES
  year|ep#|name
  ----+---+----
  1999|  1|Attack of the Killer Tomatoes
  1999|  2|Attack of the Killer Clowns
  2002|  1|Attack of the Clones

ROLES
  id|name
  --+----
   1|Jake Jones
   2|Mycroft Jones
   3|@WRITER
   4|@DIRECTOR

DID
  person|role|epyr|ep#
  ------+----+----+---
       2|   1|1999|  1    -- Smith plays Jake in 99-1
       2|   2|2002|  1    -- Smith plays Mycroft in 02-1
       1|   1|2002|  1    -- Fisher plays Jake in 02-1
       1|   3|2002|  1    -- Fisher wrote 02-1

One flaw is that if the same actor plays the same role in many episodes, the <person,role> pair will be repeated very frequently, and if you make a mistake with which person plays a role, you may have many records to update. You could normalize that out into a PLAYED_ROLE table, then use id's from that in DID, but I think the increased query complexity isn't worth it.

Updated: Minor formatting and wording changes.

[reply]

Re: Normalisation Question (OT but trust me, perl implementation)
by dws (Chancellor) on Aug 04, 2003 at 08:15 UTC

That suggests separate tables for

Person (person_id, person_details)
Role (role_id, role_details)
Episode (episode_id, episode_details)

PersonProducesEpisode (person_id, episode_id)
PersonPlaysRoleInEpisode (person_id, role_id, episode_id)

The "Actor" role gets collapsed into person, since it's implicit when an Person plays a Role in an Episode.

This structure give you good flexibility. It even allows for a Role (a Character) being played by multiple Persons in a single Episode, as might happen when a character ages, or if there's a flashback to the character in his/her younger days.

[reply]

Re^2: Normalisation Question (OT but trust me, perl implementation)

by Aristotle (Chancellor) on Aug 05, 2003 at 14:06 UTC

Makeshifts last the longest.

[reply]

Re: Re^2: Normalisation Question (OT but trust me, perl implementation)

by dws (Chancellor) on Aug 05, 2003 at 16:14 UTC

Assuming there's a 1:1 relationship between an Episode and its producer, ...

You could assume that, but the cost of being wrong is rather high. By allowing for a 1:n relationship, even if it only happens very rarely, you won't have to rebuild the schema when data arrives about a multi-producer episode.

[reply]

Re: Normalisation Question (OT but trust me, perl implementation)
by PodMaster (Abbot) on Aug 04, 2003 at 08:30 UTC

gmax

Database normalization the easier way

DBSchema::Normalizer

Aside from that, you gotta clarify which Normal Form you're going for. Here's how I see it (aiming for 2NF -- sure Movies aren't Episodes, but the intent is clear enough):

Read more... (9 kB)

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: Re: Normalisation Question (OT but trust me, perl implementation)

by dws (Chancellor) on Aug 04, 2003 at 08:46 UTC

Clarification: That a Person is an Actor is implied by the fact of their portraying a Character (a Role). Hence, you can collapse Actor into Person, and use a single table.

[reply]

Re: Re: Normalisation Question (OT but trust me, perl implementation)

by eric256 (Parson) on Aug 04, 2003 at 17:13 UTC

A Role should not need both an ActorID and a person ID, since an ActorID is keyed by a single personID, and Actor and Person share a 1-to-1 relation. Very nice presentation though.

[reply]

Re: Normalisation Question (OT but trust me, perl implementation)
by Cody Pendant (Prior) on Aug 04, 2003 at 11:15 UTC

Remember all this when I come back and ask you for the query to make the join to grab all this lovely data back out again, right?

($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
 =~y~b-v~a-z~s;                             print
[download]

[reply]
[d/l]

2Re: Normalisation Question (OT but trust me, perl implementation)

by jeffa (Bishop) on Aug 04, 2003 at 13:39 UTC

Abigail-II

suggestion

mysql> select * from people; +----+-------+-------+-------+-------+ | id | first | last | birth | death | +----+-------+-------+-------+-------+ | 1 | John | Smith | NULL | NULL | | 4 | Sally | Smith | NULL | NULL | +----+-------+-------+-------+-------+ mysql> select * from characters; +----+----------+---------+ | id | first | last | +----+----------+---------+ | 1 | Jake | Jones | | 2 | EvilJake | Jones | | 3 | Shelley | Sellers | +----+----------+---------+ mysql> select * from people_characters; +--------+------------+ | people | characters | +--------+------------+ | 1 | 1 | | 1 | 2 | | 4 | 3 | +--------+------------+

characters

people

people_characters

my $sth = $dbh->prepare('
   SELECT characters.first, characters.last
   FROM characters
   INNER JOIN people_characters
   ON characters.id = people_characters.characters 
   INNER JOIN people
   ON people_characters.people = people.id 
   WHERE people.first = ?
   AND people.last = ?
   ORDER by characters.last, characters.first
');
$sth->execute(qw(John Smith));
while (my $row = $sth->fetchrow_hashref) {
   print "$row->{first} $row->{last}\n";
}
[download]

Class::DBI

use MyDBI::people;

my ($person)   = MyDBI::people->search(first=>'John', last=>'Smith');
my @characters = $person->characters;
[download]

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

[reply]
[d/l]
[select]

Re: 2Re: Normalisation Question (OT but trust me, perl implementation)

by halley (Prior) on Aug 04, 2003 at 17:40 UTC

Here are some ideas to extend it:

I would propose the term 'player' for anyone or anything that could act out a performance. Thus s/people/players/. Even inanimate objects and other non-people may be players; think "Cujo" or "Otto Pilot as himself" for easy examples.

I would propose a table 'episodes' to capture multiple titles that are intended to be seen as continuations or illuminations on a single world. Thus, Star Wars has six canonical episodes, while Star Trek has a much larger number in its canon.

There are many cases where a single character is portrayed by many different players, either as different versions of the character ("young Indiana Jones") or different components of the character ("voice of Tweaky"). I propose to refer to these, if you want to track them separately, as 'aspects' of a single character. The simplest and most common is when a single player plays 'all' aspects throughout the episode.

A single player may perform more than one aspect and more than one character, too. Michael J Fox often plays at least two aspects of Marty McFly at once, as well as his daughter, his son, and his multiply-great-grandfather.

I would propose that a 'role' is the conjunction of 'player' vs 'character' vs 'aspect' vs 'episode'. So s/people_characters/roles/. James Earl Jones was the voice of Darth Vader in Star Wars through episodes SW4, SW5 and SW6. That's three roles by my count.

For more ideas, browse the IMDb.com site for a while and see if you can reverse engineer their tables.

--
[ e d @ h a l l e y . c c ]

[reply]

Re: Re: 2Re: Normalisation Question (OT but trust me, perl implementation)

by Cody Pendant (Prior) on Aug 04, 2003 at 22:58 UTC

Re: Normalisation Question (OT but trust me, perl implementation)
by Bagration (Initiate) on Aug 04, 2003 at 20:17 UTC

There are, relevant for your problem, the tables:

Sometimes you have persons as actors in one episode and as director in another.

Examples:

PERSON_TITLE

cnNR stands for the number in CONTRIBUTORS.

Table Titles

In this episode Duncan McNeill is actor and direktor as well.

So you can guess how the PERSON_TITLE set might change:

Your solution: you only need a further Field in your actors_role table.

I hope this was a bit helpful. Sorry, having watched Star Trek in Germany, I have only the German episode names.

[reply]