Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Re^3: Efficiently Walking Large Data Structures Stored in Multiple Tables

by dragonchild (Archbishop)
on Feb 28, 2004 at 18:51 UTC ( [id://332504]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Efficiently Walking Large Data Structures Stored in Multiple Tables
in thread Efficiently Walking Large Data Structures Stored Across Multiple Tables

If you're dealing with a database that can handle tree structures natively, maybe. But, I read the articles on doing tree structures as sets and I was ... underwhelmed. If all you need is the next page and there is only ever one next page, then deal with it that way. If you need a true parent-child structure and a way of asking for all parents without wanting to explicitly code it into your SQL and have it handled correctly by the optimizer ... good luck without Oracle or PostgreSQL.

Now, you could build the queries using some module (???), but the queries are very very very slow and very very very difficult to get right. Again, good luck. I wouldn't want to try ...

------
We are the carpenters and bricklayers of the Information Age.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

  • Comment on Re: Re^3: Efficiently Walking Large Data Structures Stored in Multiple Tables

Replies are listed 'Best First'.
Re: Re: Re^3: Efficiently Walking Large Data Structures Stored in Multiple Tables
by tilly (Archbishop) on Feb 29, 2004 at 01:42 UTC
    Why underwhelmed?

    I have not had occasion to use the nested set representation, but I have multiple friends who have. And it certainly worked as advertised for them.

    Unless you have some concrete experience of the nested set representation not working as advertised, I'm going to have to heavily discount what you have to say. Even then I'd be interested to see the implementation since I know full well how easy it is to accidentally turn an efficient query into an inefficient one.

      I'm not saying that it works or doesn't work. I have no experience in that regard. What I am saying is that the nested set representation looked like a lot more work than CONNECT BY in Oracle. It also looked like something would be very easy to mess up and that I would want to put in a bunch of PL/SQL routines (or, at least, provide at the DBIx:: level).

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Earlier you wrote, Now, you could build the queries using some module (???), but the queries are very very very slow and very very very difficult to get right. Again, good luck. I wouldn't want to try ... Now you are admitting that you don't have experience to back that dismissal up.

        Fine. Based on reported experience I believe that the queries to do nested set trees are somewhat complex, but if you have a guide (eg Joe Celko's implementation) in front of you it is not that hard to get it right. Furthermore in practice they are quite fast to execute. Modifications being more expensive than selects, which is OK since in practical applications (eg a threaded view like discussion sites do) you tend to find that selects are more common.

        They have further benefits over any kind of iterative requerying in that the usage pattern fits how relational databases are designed. In particular you eliminate a lot of round trips that stress latency and your query optimizer.

        If your situation looks like this, then for all of the apparent complexity, knowing that nested set representations of trees are effective can be a lifesaver. (Quite possibly in a literal sense.)

Re^5: Efficiently Walking Large Data Structures Stored in Multiple Tables
by adrianh (Chancellor) on Feb 29, 2004 at 22:41 UTC
    If all you need is the next page and there is only ever one next page, then deal with it that way.

    I did have the words "If" and "might" in my post y'know ;-) If all you need is the next page then it's obviously overkill. If you're doing more complex stuff a more generic solution is worth looking at.

    Now, you could build the queries using some module (???), but the queries are very very very slow and very very very difficult to get right.

    Erm. No they're not. The queries are damn quick. You have a representation that leverages the things that RDBMS are good at. .

    As for difficult to get right - not particularly in my opinion. Takes a little time to get your head around the representation - but the operations to access nodes and subtrees are pretty simple. Insertion is the only vaguely complicated operation.

    Hmm... I feel a meditation coming on...

      I can see how selection isn't very difficult. Nested set trees look to be optimized for retrieval. My issue is modification. Updates to where you stand in the hierarchy (promotion, for example) aren't incredibly hard, unless your entire department moves with you to someone that never had a department under them before. Deletion doesn't look to be a big deal - you simply have to deal with the same problem you have in CONNECT BY, which is what to do with the entities under you.

      I guess my biggest problem with the whole notion is the fact that I affect record A. In Oracle's tree representation (and I'm assuming PG's, as well), that's the end of it. With nested set trees, I might have to re-number the entire hierarchy's intervals. Granted, that's the pathological case, but it's still something you have to code for. And, that has to be checked in every modification. (Maybe not DELETE, but definitely for INSERT and UPDATE.)

      Another thing that concerns me is how this might potentially affect triggers written against that table. Most developers work with tables that aren't nested sets. (I understand that's how RDBMS's think about things, but most developers of my acquaintance aren't as ... flexible.) The data structure, as a whole, looks very fragile to me. There's a lot of bookkeeping involved (unless I'm missing something obvious).

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        I guess my biggest problem with the whole notion is the fact that I affect record A. In Oracle's tree representation (and I'm assuming PG's, as well), that's the end of it.

        Just in case this wasn't clear before. If you're RDBMS supports tree structures natively use them. Using nested sets in this case is obviously a pointless reinvention of the wheel when you already have a native mechanism for querying/updating trees.

        However, if your DB doesn't support tree structures well or if you need to consider the possibility of migration to a DB that doesn't support them then nested sets are pretty darn good in my experience.

        With nested set trees, I might have to re-number the entire hierarchy's intervals.

        Yup. As I mentioned before insertion is O(N). You can make the average case a lot better by sneaky work with the intervals.

        However since the vast majority of applications read a lot more than they change the hierarchy the more efficient queries more than make up for the less efficient inserts.

        Another thing that concerns me is how this might potentially affect triggers written against that table.

        I fail to see how this is any more/less of an issue than any other representation?

        Most developers work with tables that aren't nested sets. (I understand that's how RDBMS's think about things, but most developers of my acquaintance aren't as ... flexible.) The data structure, as a whole, looks very fragile to me. There's a lot of bookkeeping involved (unless I'm missing something obvious).

        You wrap the bookkeeping in a client side API or stored procedures on the database side - just like you would with any other bits of repetitive SQL.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://332504]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-20 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found