Indexing Sublists

httptech has asked for the wisdom of the Perl Monks concerning the following question:

Let's say I have a database-driven site where I want to ask for a set of information about a user, then re-present that information later. Part of the infomation comes from "multiple" select lists. For example the structure could look like: NAME|AGE|HOBBIES where hobbies could be any number of choices from this list:

Perl Programming
Reloading PerlMonks
Retipping Shoelaces
[download]

So, what is the best way to index all of the choices from the list and insert it into the HOBBIES field in the database? Let's say I'm using a SQL database in this case.

What I would likely do is create a second table to contain the list data; for example:

ID|DATA
1|Perl Programming
2|Reloading PerlMonks
3|Retipping Shoelaces
[download]

Then in the user table, I use join on the list to have a string to put into the user table like this: Anonymous Monk|89|1:2:3 Then in my perl code I use split to break up the HOBBIES field and then perhaps show its output by name using an array that I loaded from the database ahead of time, $hobbies[1], $hobbies[2], $hobbies[3] But that just seems so obvious and inelegant, it makes me wonder how the professional Monk might handle this situation. So, how do you like to handle this?

Comment on Indexing Sublists Select or Download Code

Replies are listed 'Best First'.
Re: Indexing Sublists by marcos (Scribe) on May 15, 2000 at 17:09 UTC
If I remember well some database theory, you are using a denormalized database: you have a field in the USER table which actually contains a list of user hobbies: `Anonymous Monk\|89\|1:2:3` [download] From a database-design point of view I think you should have 3 tables: `USERS(USER_ID, NAME) HOBBIES(HOBBY_ID, DESCRIPTION) USER_HOBBIES(USER_ID, HOBBY_ID)` [download] The table USER_HOBBIES describes the relationship between USERS and HOBBIES which is a many to many relationship. If you had this db structure, and your current user is for example "Anonymous Monk", whose USER_ID is 89, you can retrieve his hobbies with a select performing a join: `SELECT HOBBIES.DESCRIPTION FROM USER_HOBBIES, HOBBIES WHERE USER_HOBBIES.USER_ID = 23 AND USER_HOBBIES.HOBBY_ID = HOBBIES.HOBBY_ID` [download] With this approach you don't need the hobbies array, and you don't need to perform any split from your perl script: you move the logic into the db and let the db engine do the work. You may want to evaluate which approach gives the best performance: the denormalized database with perl performing much work, or the db with one more table and the join query. IMHO db engines perform these tasks very well, and AFAIK denormalization is mainly used when designing datawarehouses (which are not actually 'relational' databases) I hope this may help marcos	[reply] [d/l] [select]
RE: Re: Indexing Sublists by marcos (Scribe) on May 15, 2000 at 17:11 UTC
I made a mistake, the query should obviously be `SELECT HOBBIES.DESCRIPTION FROM USER_HOBBIES, HOBBIES WHERE USER_HOBBIES.USER_ID = 89 AND USER_HOBBIES.HOBBY_ID = HOBBIES.HOBBY_ID` [download] Sorry for that marcos	[reply] [d/l]
Re: Indexing Sublists by mikfire (Deacon) on May 15, 2000 at 16:53 UTC
In an othorgonal world, you would never encode information like that into the user table. Instead, you should create a third table that looks like: ID MonkID HobbyID 1 89 1 2 89 2 3 89 3 ... where ID is an auto-incremented key, MonkID is the ID of the Monk in question and HobbyID is a hobby. It may not show in this case, but I have found doing it this way keeps a lot of flexibility in your code and saves some serious rewrite time when your client decides they want the database to work in a radically different way then you had intended. Mik Mik Firestone ( perlus bigotus maximus )	[reply]
RE: Re: Indexing Sublists by httptech (Chaplain) on May 15, 2000 at 17:44 UTC
Very good, I like this. What is the reason for the auto-incrementing primary key? Couldn't you just make an unique index of the MonkID and the HobbyID?	[reply]
RE(3): Indexing Sublists (kudra: composite primary keys--databases) by kudra (Vicar) on May 15, 2000 at 17:57 UTC
As you suggested, you index both MonkID and HobbyID because you're planning to search for all monks with a particular hobby and all the hobbies of a specific monk. But that doesn't mean you won't want a single primary key rather than a composite one. The db theory book I've got here says "When possible we should try to avoid composite primary keys for entities..." although it isn't kind enough to offer a compelling reason, or really any reason at all. If you ever need to refer any of the tuples in the third table within another table, you'll probably want a single key. But I seem to recall that you can use the internal tuple identification (in Postgres, at least, where I believe it is the oid) rather than a sequence for such cases. I can't access the postgres site currently, so I'm unable to verify that, and I don't know if that's possible or recommended with other databases.	[reply]
RE: RE: Re: Indexing Sublists by mikfire (Deacon) on May 15, 2000 at 21:57 UTC
I get nervous doing that - most of my database design is from reading one book many, many moons ago and then a lot of work. I think auto-increment keys like that are cleaner. By using a database, you have kind of already said you are not that concerned with size. An extra INT between friends shouldn't matter. And composite keys, especially when referring to foreign keys, just sound very dangerous to me. But I have no theory to back any of this up - it is your database to design as you wish. Mik	[reply]
RE: Indexing Sublists by Specimen (Acolyte) on May 15, 2000 at 19:07 UTC
Hi Assuming i've understood your question correctly: The usual way to tackle that is to have a user table, a hobbies table and a userhobbies table. The user table would have a user id, each id representing one user (no way!) and the hobbies table has a list of hobbies each with a hobby id (no way!) and say a description or something. The user hobbies table just matches users to hobbies and would normally just be a two column table made of user id and a hobby id. Each user could obviously have multiple entries in that table pointing to seperate hobbies then you just do for example select hobbies.description from userhobbies, hobbies where userhobbies.userid = X and userhobbies.hobbyid = hobbies.id Then just loop through the recordset to get the result. If you have heavy loaded server with very large results getting returned like that then you can use things like bitmaps to represent the data and get all your results from one returned row. It's not normally considered a very beautiful way of doing things (but beauty only matters in women imho :)	[reply]
Re: Indexing Sublists by httptech (Chaplain) on May 15, 2000 at 22:55 UTC
Ok, these are all great pointers. Guess I need to find a book on database theory :) This also simplifies another facet of my problem, which is deleting from the list; as long as I delete from both the hobbies and userhobbies tables where hobbyid=x then I'm all set. Thanks, all!	[reply]