ramkamath has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re: tree structure of database
by ELISHEVA (Prior) on Sep 25, 2009 at 04:24 UTC

    Do you mean table count or row count?

    Perhaps you could explain the context of this problem a bit more. Is this work? homework? The context might help people feel more comfortable giving you detailed answers. From your description it could be work ("I have the requirement of maintaining") but the way it is expressed ("Storing a tree structure") sounds a bit like the kind of homework question a database professor might ask if he is trying to help you transition from programming data structures to their relational database equivalent.

    In programming data structures trees are often represented using a parent node with a list of children. In relational databases the storage locations are reversed. The goal in a relational database is to store everything in tables where there is one and only one value for each combination of table column and row. In other words, the database table is something like a spreadsheet with exactly one value in each cell, but much more efficient.

    The most straightforward way of implementing a tree structure in a database is to use foreign key columns. A foreign key column is a database table column that stores the id of a row in a database table, either the current table or some other table. To construct a tree, you set up a database table where one of the columns stores a foreign key. Each record in the table represents a node in the tree. When you create the row for that tree node, you store the parent of the tree node in the foreign key column.

    In a programming data structure you navigate from parent to child by visiting each of the children. You start with a list of top tree nodes, visit each node, get the list of their children and visit each of those children. But how do you do that in a database? Remember, each row only knows its parent.

    First of all, you will need a convention for marking the rows that are top nodes. Logically speaking these are the rows that have no parent, so you might use NULL when you have no parent. To find all of the top nodes, write an SQL query to find all the nodes with a NULL value in that foreign key column discussed earlier. Then to get all of the children of node X, you write another query that selects all of the nodes where the foreign key id equals the row id of node X. Then for each of the rows returned by that query (lets say X1, X2, X3), you write yet more queries: one to retrieve every node where the parent is X1, another to retrieve every node where the parent is X2, and so on. You repeat this query writing process until you find a node with no children.

    More precisely, you write a program to repeat this query writing process. SQL doesn't have a syntax for repeating a query until it returns no results (though it has been discussed and there are published proposals) so this repetitive construction of queries is normally done in a stored procedure or in a language with good database libraries, for example Perl :-).

    It helps a lot if you make sure your database table has a single column as primary key. That way you have a very simple row id which you can use in your queries. You also need to be very careful that your sequence of queries don't cycle. If, for example, A is a child of B which is a child of C which is a child of A, then the queries "all children of A", followed by "all children of B", followed by "all children of C" would turn into an infinite loop when you tried to visit all of the children of C and get their children. One of those children would be "A" and would lead to the creation of the query "all children of A" which is where you started.

    What I have described is the simplest case. There are much more complicated ways to implement tree structures in databases. For example, you could have a tree where nodes can belong to any one of three different database tables, depending on the type of object stored in each node. The idea is similar (using foreign keys to store parents), but the queries to retrieve the children may be much more complicated. Hopefully, if you are just starting with relational databases, you won't need such a complicated tree. If you do need something that complicated, please feel free to post again if you are using Perl on this project. Next time, it would be a good idea to post some code showing what you tried so far.

    Best, beth

      Thank you ......

      am sorry for not giving proper details,since i am new to perlmonks also. It is not home work actually,i am designing one website for comercial usage, in which the number of customers grows in tree structure.I thought of maintaining there details in different tables for each customer please see(http://paisahub.com/Broucher1.aspx).I am trying to design same functionality website.


      Regards ARK

        How can YOU be designing anything intended for commercial usage?!? The design should be done by someone who actually does know at least a tiny little bit about databases and programming in general! You may then do some of the programming if supervised, but design is clearly far out of your current abilities.

        And NO YOU DO NOT WANT TO HAVE DIFFERENT TABLES FOR EACH CUSTOMER. FULLSTOP!

        I pity the guy who ends up having to rework your design after you fail to deliver anything workable!

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re: tree structure of database
by desemondo (Hermit) on Sep 25, 2009 at 04:04 UTC
      Thanks for your Suggestions,

      Actually i am trying to design one website which is almost same as http://paisahub.com/Broucher1.aspx

      Regards ARK
Re: tree structure of database
by planetscape (Chancellor) on Sep 25, 2009 at 06:23 UTC

      I am more than a little puzzled by Joe Celko's advocacy here of what he calls the "nested set model". He appears to be advocating a table that stores visit order instead of parent node ids, like this:

      OrgChart emp lft rgt ====================== 'Albert' 1 12 'Bert' 2 3 'Chuck' 4 11 'Donna' 5 6 'Eddie' 7 8 'Fred' 9 10

      In the table above, the "lft" column holds the visit order of the node when descending through the tree. The "rgt" column holds the visit order when ascending. His argument is that it makes queries for "all children" and "all parents" of a node much easier.

      I can see some argument for it in a read-only database or data mining software, but for modifiable data it is a disaster. To begin with it is severely denormalized. The value in the "lft" is dependent not only on the key, but on every node before it in the visit order. The value in the "rgt" column" is dependent on every node before it and on every child, grand-child, and great-great-great-grandchild node. Together each record is dependent on nearly every other record in the table. This means that it is impossible to insert, delete, or move a single record in the table without editing nearly the entire table as whole, especially if the node is at the beginning of the visit order. If you insert a record, the "lft" and "rgt" column of every record in the visit order after it must be updated. For every parent, the "rgt" node must be updated as well.

      To implement this requires strong transaction support. But even with strong transaction support, editing an entire table for each insertion is a bad idea. For the brief period while the edits are being made the transaction will need an exclusive lock on the table as a whole. Users who only need to read the data or need to insert or delete at another point in the graph will be shut out. This is unacceptable in a high traffic database. For modifiable tables, this implementation is simply not scalable.

      The beauty of the traditional foreign key model is that each insertion of a node affects only the node in question. Insertions are safe actions even in a DBMS with weak transaction support. Each repositioning of a child (change in parent node) similarly affects only one record and can be carried out safely even in a database without transaction support.

      When tree relationships are represented by storing each node's parent (instead of a pair of lft-rgt columns), even deletions affect only a small subset of nodes, not the entire table. When a node is deleted one of two things happen: all of the descendants will be deleted with the node (known as cascading delete) or the immediate children will be assigned to a new node. One can think of this as either (a) the boss and his entire department being axed or (b) the boss being fired and his team assigned to a new manager. In either case, only a subset of rows (employees) is involved, not the whole table (company). If the DBMS supports row level locking, then editors and readers of different branches of the tree will be able to go about their business, just as employees in other divisions can at least try to ignore the firing going on in the cubicles down the hall and get on with work.

      Even in a read-only database, storing tree relationships as lft-rgt values rather than as a parent foreign key can be problematic. "lft"-"rgt" columns are efficient at selecting "all children of" or "all parents of", but they make it very difficult to select "all siblings of" or "all immediate children of". There is no one value one can use to find all of these nodes. Each child only stores the sequence number on ascent and descent and this is different for each child. You can select "all children of X" easily because you can query for all nodes with a "lft" value greater than X and a "rgt" value less than X, but you can't easily tell how deep you are in the tree, i.e. whether you have an immediate child, grandchild, or great-great-great-grandchild.

      Even when one does want "all children of", this encoding is problematic. To construct a visit order we have to make an assumption about the order in which siblings are visited. If you sort rows based on the value of "lft" or "rght", you will get that assumed ordering of siblings. If you want to change the ordering, you will need to track down all of the nodes with a common parent, resort them, and renumber them.

      Hard-coding sorting logic violates one of the main design benefits that motivated the initial development and popularization of the relational model: the ability to provide a measure of future-proofing. Data tends to stay with an organization, but the ways they use it and want to view it change dramatically over time. This includes the ways in which an organization wishes to sort it.

      There are, of course, use cases where speed is much more important than either future proofing or flexibility. In those cases I see nothing wrong with hard coding specific sort orders (data mining of archival data being one common example). But in a live system that needs to support either ad-hoc reporting or real-time editing of a tree, "lft"-"rgt" columns instead of parent foreign keys seem to me a really bad idea.

      Best, beth

        I agree... I would like to just share my experience.

        I implemented some tree data structure in sql a few times. Ever implementation was different, not only due to rdbms capabilities, but due to different requirements. I think it is very important to exactly define what data structure have to be layered and what operations have to be performed on it. For instance, is the order of child nodes important? Is serialization of the huge tree frequent task? What about concurrent access to the nodes of the same branch?

        Like classic normalization, my experience says that it is generally better to minimize logical relations between rows of the same table, otherwise sql queries become mastodonts.