xunker has asked for the wisdom of the Perl Monks concerning the following question:

Hi all; long time caller, first time listener; I'm looking for a database independent (as much as possible) way to store what I consider to be a complex structure; I want to store a structure resembling an array of lists/array of arrays:
Node0--+--SubNode0--+--SubSubNode0
       |            |
       |            +--SubSubNode1
       |
       +--SubNode1--+--SubSubNode0
                    |
                    +--SubSubNode1
                    |
                    +--SubSubNode2
..into a single field (if possible); And I'd like it to be portable to another DB Engine because I'm using CSV right now, but will probably change over to something like MySQL or even DB2 later on. I have my own crufty ideas about how to do this, but I thought I would ask for enlightenmed before I did anything I'd regret.

-Xunker, JAJPH (Just another Junior Perl hacker)
  • Comment on Storing Complex Structures in a database

Replies are listed 'Best First'.
Re: Storing Complex Structures in a database
by AgentM (Curate) on Sep 30, 2000 at 02:17 UTC
    ooh! You definitely want to look up some info on database normalizing. You could put all this stuff in the same field, but that wouldn't make sense if data is repeated. As an example, you need to normalize tables that have something as simple as storing names and addresses. Try this link.
    AgentM Systems or Nasca Enterprises is not responsible for the comments made by AgentM- anywhere.
Re: Storing Complex Structures in a database
by lhoward (Vicar) on Sep 30, 2000 at 02:32 UTC
    Ovid and AgentM are definitely on the right track.

    However, if all you want to do is take a perl datastructure, and then save it into a DB for later ressurection you should look at Storable, FreezeThaw and Data::Dumper. The process is called serilization and is very DB independant. Just serilize your data structure and shove it in a VARCHAR field. You can retrieve and re-inflate it from that field later. However, this is only an apropriate solution if you do not need to access the contents of the structure until you can "inflate" it back into perl.

      VARCHAR would probably not be a good idea here as the data is likely to exceed 255 characters (MySQL restriction) pretty quickly.

      If the post-er wants to serialize, I'd imagine that blob or text would be a more flexible solution and without the 255 character restriction.

      Personally, I'd go with a single table approach (assuming that each node has only a single parent) using a parent node id field...

      nodes
      node_id (UNIQUE INT)
      parent_node_id (INT)
      data (whatever is relevant)

      That's just an off-the-cuff response, but as your nodes and subnodes theoretically contain the same information I wouldn't go sticking them in different tables. This way works by simply adding a parent_node_id column that would allow you look either up or down the node tree any arbitrary amount. It's not necessarily the fastest way (because of the multiple selects to move up or down the tree, so as the nodes reach 'n' levels they take, what is it, O(n) time) but it represents a relatively consistent interface that won't throw you into problems with your database.

      A question of tradeoffs I suppose.

        I should have been clearer. When I said VARCHAR what I meant was "some sort of character storage that is of variable length" not necessarily the VARCHAR type. The exact name of the apropriate type will vary based on your database and your storage needs.
(Ovid) Re: Storing Complex Structures in a database
by Ovid (Cardinal) on Sep 30, 2000 at 02:26 UTC
    For your own sake, listen to AgentM for he knows whereof he speaks.

    What I'm seeing at a glance from your diagram is, at the minimum, three tables with an unknown number of fields per table. Database normalization seems rather simple, but it gets complicated the larger the database. In fact, I think it would be safe to say that I have met MANY programmers who know SQL and database programming like the back of their hands yet don't know anything about database design.

    Trust me, you don't want that data in a single field.

    Oh, and welcome to Perlmonks :)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

Re: Storing Complex Structures in a database
by clemburg (Curate) on Sep 30, 2000 at 15:42 UTC

    If you want to be really database-independent, then think about this:

    First:

    • SQL is currently not database independent
    • parsing SQL is too hard
    • means you have to build on top of that

    Second:

    • it is very easy to solve your problem for each specific database engine
    • it is very hard to solve it in general

    Therefore:

    • design an API (a bunch of functions) that does the tasks you need
    • build a simple interpreter function that calls these functions (an "apply()" for your API)
    • then you can simply re-implement these functions for each new database engine (which is trivial for a single engine)
    • and add a handler for the engine to a table of database engine handlers
    • make the interpreter take a reference to the appropriate engine

    I used this approach for a lot of database-handling tasks from CGI scripts to online-database-coupling-interfaces, and it scales very well.

    Specifically, I believe that anything relying on somehow working with "standardized" SQL is doomed from the beginning. The different SQL dialects are simply to different to achieve this.

    If you want to be database independent, you will need an abstraction layer above the database level - and this means no particular database engine features may surface at this layer of abstraction.

    Just my 2 cent ...

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: Storing Complex Structures in a database
by Anonymous Monk on Sep 30, 2000 at 11:41 UTC
    You can use a self-referential table to store a tree structure. To reproduce the tree, you'll find DBIx::Tree useful. It is also portable between DBD::*.
Re: Storing Complex Structures in a database
by Fastolfe (Vicar) on Sep 30, 2000 at 03:01 UTC
    Take a look at the other nodes first, but as a supplement/alternative, consider building two tables. In the first table, define a "node" and give each node a unique ID. In the second table, define parent-child relationships. I.e. for every child that a parent has, we will see one row in the second table with the parent's ID and the child's ID. When you wish to get a list of children for a parent node, just select child ID's from the 2nd table where the parent ID equals the current node's ID.
RE: Storing Complex Structures in a database
by little (Curate) on Sep 30, 2000 at 12:37 UTC
    I used to use a 1:n Relation for this, e.g.:
    ID ParentID title text 1 0 bla I would.. 2 1 re:bla I better would ... 3 1 re:bla Don't you.. 4 2 re:re:bla This is an ...
    Though each item in your data only needs to know it's parent. This works fine for me with *.csv files. When you use DBI::CSV you will be able to use all files in a specified folder like tables, whereas the folder would be your DB. AND you can perform SELECT on it.
    So using the DBI will grant you the ability to upgrade to a real DBMS later.
    You'll do yourself a favour if all of your indices start not at zero but at one, because DB indices start usually at 1, so in above example a ParentID of O would always mean, that this entry belongs directly into the tree root, but the tree root is the only element that contains iteself (as long as you use only one *.csv file).
    Have a nice day
    All decision is left to your taste
Re: Storing Complex Structures in a database
by 2501 (Pilgrim) on Sep 30, 2000 at 06:26 UTC
    what about treating this problem in a binary search tree sort of fashion?

    you could have a single array of data objects, and each object has methods for setting & returning the values of Child1 & Child2 (which point to other data objects within your array), the Data element and optionally depending on what you are doing, the parent. This method would also be ideal for keeping things sorted and searchable by the data value. This method is makes it quite easy to keep an ordered list of maintainable data.
RE: Storing Complex Structures in a database
by Anonymous Monk on Sep 30, 2000 at 21:39 UTC
    Hmmm... XML?
Re: Storing Complex Structures in a database
by princepawn (Parson) on Oct 02, 2000 at 17:43 UTC
  • I dont know how database-independent you want to be, but it seems that you are fixed on Relational DBMSes when tree structures are easier mapped to Object databases. The free one with excellent Perl interface written by Lincon Stein is called Acedb. visit http://STEIN.CSHL.ORG for more details
  • Lincoln Stein has also written Boulder which allows for creation of storage of nested Perl structures.
  • Depending on where your data is coming from, you may want to take a look at DBIx::Tree which creates tree structures out of self-referential database tables.