(Ovid) Re: database design issue: one big table vs hundred of smaller tables

I don't see that two tables are going to be a truly viable solution. Your items have sufficiently unique and non-overlapping attributes that trying to stuff them all into one huge table is going to wind up with many items having a lot of blank fields. You're just begging for trouble with that. To distinguish what items have what attributes, you'd either have to create an "itemAttribute" table, describing what items in what categories have what attributes (and that's going to complicate your SQL), or you'll need to move that information into your code which will then require code updates to be more frequent as you're forced to synchronize the code and the database with subsequent updates.

I'd probably start by building a table similar to the following:

nodeID  parentNodeID  Description
------  ------------  -----------
1       1             persons
2       1             kids
3       1             grown_ups
4       4             books
5       4             manuals
6       4             dictionaries
[download]

You can make the nodeID the primary key and have a foreign key constraint from the parentNodeID to the nodeID. To find all root nodes:

SELECT nodeID where nodeID = parentNodeID
[download]

That's fairly easy to drill down in and with proper indexing, should be simple to search. It gets a little more complicated with items because you can't just stick a nodeID in items and then select all items with the appropriate nodeID. Perhaps you could create a meta-table listing the table names of the items you need and create lookup tables with that (mind you, this is all off the top of my head).

Obviously, you want this to be as fast as possible, but I would strongly caution you against designing this system with speed in mind. You want your database schema to be correct. If the schema is poor, you're much more likely to suffer corruption from anomalies. If that starts to happen, you either have to redesign your database correctly, or put in a bunch of extra code to prevent the anomalies. The latter option, of course, is going to have a performance impact, so you'd wind up losing what you might think you gain by trying to optimize up front.

Prematurely optimizing is a common problem. Anyone with any experience with databases knows the value of a properly normalized schema. Since your application has not been written yet, you can't judge whether or not performance is going to be a problem. Why trade the known benefits of normalization with the unknown benefits of optimization when you don't even know if performance is yet an issue?

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on (Ovid) Re: database design issue: one big table vs hundred of smaller tables Select or Download Code

Replies are listed 'Best First'.
Re: database design by AltBlue (Chaplain) on Nov 27, 2001 at 07:53 UTC
Thank you all for your replies, they offered me some really good points to start digging on. I haven't posted my actual database structure in my first note on this topic so here it is: -- 'attrs' keeps the attributes list in CSV form CREATE TABLE categories ( id INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT, name CHAR(255) NOT NULL UNIQUE DEFAULT '', parent INT UNSIGNED NOT NULL DEFAULT '', attrs CHAR(255) NOT NULL DEFAULT '', time TIMESTAMP, PRIMARY KEY(id), KEY(name(16)), KEY(parent) ); -- _[id] is the id of the category as existent in 'categories' table CREATE TABLE cat_[id] ( id INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT, name CHAR(255) NOT NULL UNIQUE DEFAULT '', time TIMESTAMP, ... attribute ... attribute ... attribute PRIMARY KEY(id), KEY(name(16)) ); -- bloat table to speed up common search patterns CREATE TABLE items_all ( id INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT, name CHAR(255) NOT NULL DEFAULT '', category CHAR(255) NOT NULL DEFAULT '', cat_id INT UNSIGNED NOT NULL, PRIMARY KEY(id), KEY(name(16)), KEY(category(16)), KEY(cat_id), ); [download] As you see, it's quite a clean/lame implementation of the second variant, as stated in my previous post. footpad suggested a 'fields/attributes' table instead of my 'attrs' column in the categories table. It should be an improvement for a large number of attributes, but I'm thinking there will be no such category with more than let's say 20 attributes for each category. Normalization is a good thing, not that I was not thinking about it, but the problem at the first (the second too :) glance is that it can't be no normalization as I want to keep the bazaar feature ;-) meaning that this system should be able to store people as well as books, tools, flavours, images or anything else disregarding any resemblance between categories. In fact the only relation that I'm thinking to enhance somehow is the one of father-child category: as many of you have noted, there could be many overlapping attributes in some cases (humans-:grown-ups-:artists) so maybe there some normalization should be considered. Heh, some OOP techniques would be better in these cases as inheritance. Even more, polimorphism would be needed if i.e. would like to move around items from one category to another (i.e. 'musicians' from 'humans/grownups/artists/' in 'music/') :) Heh, it will get so complex that I think that KISS should come up.For speed up purposes there may be used additional redundant databases containing specialized table in various things etc (as my all_items table). I have already used Swish and maybe I will use it for this project too. Cascade, never heard of, I'll give it a try. tnx. -- AltBlue.	[reply] [d/l]
Re: (2) database design by mandog (Curate) on Nov 28, 2001 at 22:48 UTC
It doesn't feel like you need a tree structure at all. Maybe I am missing something... If you really do need a tree then Joe Celko probably has what you need. (skip the travelogue at the start) `table: Categories: category_desc \| catID table: Items: item_desc \| itemID \| catID table: Attributes attribute_desc \| AttributID \| itemID SELECT category_desc, item_desc, attribute_desc FROM Categories, Items, Attributes WHERE Categories.catID = Items.catID AND Items.itemID = Attributes.itemID;` [download] This scheme has many advantages over the "multiple tables" solution. A couple are: You won't have to add a table every time you add a category. With cascading deletes, you don't have delete tables when you delete a category. This scheme has many advantages over the "one big table" solution. Three are: You won't have to add/delete columns if you add/delete attributes In a normalized table structure you have a lot more flexibility with the queries. if you change a description you will do it in only one place This scheme has no disadvantages over the two you have been considering. If performance is an issue. (and it is unlikely that it is) You are better off exploring caching of one sort or another. Apart from being off topic, you might do better by asking database questions on a database forum. The monks are a nice bunch, but not every problem is a nail. email: mandog	[reply] [d/l]