database table advice

coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

I'm wracking my brain on this one as my database structure-savvyness isn't quite up to par as of yet, so before I make a sad attempt and waste dozens of hours...,I'm here to ask the wise monks to help me get started.

I'm making a link management system which I know upfront I can do 99% of the work. The part that boggles me is how some scripts allow infinite categories (and infinite categories DEEP!). How is it possible to set up infinite levels of categories for each link?

my $sth = $dbh->prepare(
    "CREATE TABLE IF NOT EXISTS categories
    (
       id int auto_increment NOT NULL,
       url VARCHAR(300) NOT NULL,
       primary key  (id)
    )");
$sth->execute();
[download]

I started going this route and using the URL as a common field to relate to another table. That's fine and dandy, I can give a single url a million different categories this way. But I'm not sure how to make categories have categories of their own, especially since the end user will create the levels via a web form during configuration.

Some sample data:

Large Animals  -   Small Animals   - Ocean Life
- 4-legged     -   4-legged        - Under 12"
- 2-legged     -   2-legged        - 12" - 3'
               -      *invertebrate
                          *carnivore
                          *herbavore
[download]

As shown above, there are multiple levels of sub categories for the major categories. I am totally confused at where to start. This will be a MySQL database.

My guess is I'd have one table categories for all the first-level categories. Then I could create tables for every single next-level category saving the information of the categories it relates to in each of them. But THAT seems wrong and definitely a waste.

Comment on database table advice Select or Download Code

Replies are listed 'Best First'.
Re: database table advice by aufflick (Deacon) on Mar 27, 2007 at 02:15 UTC
The database structure for a tree of data is pretty easy - navigating that tree can be tricky. All you need is for each category to have a `parent_id` field, something like the following (untested) postgres DDL: `CREATE SEQUENCE category_ids; CREATE TABLE categories ( category_id int NOT NULL default nextval('category_ids'), parent_category_id int ); CREATE SEQUENCE link_ids; CREATE TABLE links ( link_id int NOT NULL default nextval('link_ids'), category_id int REFERENCES categories, url VARCHAR, description VARCHAR );` [download] Hopefully you can see how that allows for a tree of categories like: Animals{1} / \ Mammals{2} Invertibrates{3} / \ Cows{4} Humans{5} If the numbers in curly braces are the `category_id`s, then, for example, the `parent_category_id` of Mammals is 1 and that of Humans is 2. This maps the tree perfectly, but determining whether, for example, a Human is an Animal but not an Invertibrate is tricky. Oracle has some nice tree extensions to SQL that I have used. I think Postgres does too these days. You can also get an RDBMS neutral implementation by adding a `sort` column where you store binary sort data. Implementing such a beast is an exercise left for the reader ;) Update: what you're looking for in Oracle is "CONNECT BY". SQL 99 has "recursive" query support, but Postgres (and presumably MySQL) don't support it. For an excellent discussion of hierarchical data storage and retrieval in SQL see http://www.yafla.com/papers/sqlhierarchies/sqlhierarchies.htm - for a description of a Postgres implementation see http://openacs.org/forums/message-view?message_id=16799 Update2: The queries in the link posted from mysql.com that use a single parent id look to me like they would slow down significantly for each nesting level you add - and some of their queries enforce a maximum depth. There's only so many times you can type "LEFT JOIN"! Their idea using left and right bears some investigation, but I would be nervous about application code having to keep that data consistent.	[reply] [d/l]
Re: database table advice by whereiskurt (Friar) on Mar 27, 2007 at 02:02 UTC
The question I'm hearing is 'How do I store hierarchical data in a relational DB?' (of course I could be missing the point.) If not, take a peek at Managing Hierarchical Data in MySQL for some more information. This article applies to more than just MySQL of course. I personally really like the 'nested set' structure for hierarchies that don't change a lot. Enjoy! Update: I've just discovered a slightly more verbose example that compares the adjacency list model and the nested set. One thing I haven't seen discussed, yet, is the idea of adding another 'dimension' to the tree. There is a 'left' and 'right' for the nested set values, but what about adding a 'z' field? (depth) I wonder if the penalties of updating the nodes could be reduced with something like that. Just thinking out loud. Kurt PS: What better way to warm up your fingers than hammering out some Perl code! :)	[reply]
Re: database table advice by jhourcle (Prior) on Mar 27, 2007 at 04:47 UTC
I know, it doesn't handle the case of infinately deep categories, but in some cases, using a form of notation may be the better alternative, partcularly if you're using faceted classification in a poly-heirarchy. What we do in that case, is we assign each facet a letter, and we make sure that we emit the facets in a consistent order. So, using your example: A : habitat A1 : land animals A2 : aquatic animals B : animal size B1 : small animal B2 : large animal C : number of legs C0 : no legs C2 : 2 legs C4 : 4 legs D : backbone? D1 : vertebrate D2 : invertebrate D2.1 : invertebrate w/ exoskeleton D2.2 : invertebrate w/out exoskeleton E : diet E1 : carnivore E2 : herbavore So, for instance, a dog might be `(A1;B2;C4;D1;E1)`. Parents to that would be any of : `(A1;B2;C4;D1) (A1;B2;C4;E1) (A1;B2;D1;E1) (A1;C4;D1;E1) or (B2;C4;D1;E1)`. A worm might be `(A1;B2;C0;D2.2)` (no E, as I don't think diet is significant in classifying them, but i could be wrong) Parents would be : `(A1;B2;C0;D2) (A1;B2;D2.2) (A1;C0;D2.2) or (B2;C0;D2.2)`. Now, to search this, we have to use SQL's 'LIKE' command, and we need to make sure that the notation is structured in a way to allow us to unambiguously specify a category. The easiest way is to end each classification within a facet with a dot (to make sure we're not matching 'A11' when we ask for 'A1'), and to prepend each facet in some delimiter (so we don't get facet AA when we want A): dog : ;A1.;B2.;C4.;D1.;E1. worm: ;A1.;B2.;C0.;D2.2. If we want all land animals : `category LIKE '%;A1.%'` If we want all land animals with four legs: `category LIKE '%;A1.%;C4.%'` ... So -- the problems? Really hard to re-arrange after the fact. Benefits? Any depth or number of facets, provided you've allocated enough space for the notation ... of course, the notation can get rather large in complex systems, and it much less efficient to use than the other methods described above.	[reply] [d/l] [select]