http://qs1969.pair.com?node_id=457924

samtregar has asked for the wisdom of the Perl Monks concerning the following question:

Ahoy Monks. I need an algorithm, I just don't know which one. Here's my problem:

My problem is that cycles give my code indigestion, because I'm trying to evaluate the graph by looking through nodes in order (which is more-or-less random). When I find a case that's causing a cycle (i.e. an A node is linking through a B node back to A via the C link type) I go into the code for type A and apply a local fix. This usually means I create the ID for A early so that B can find it at the right point.

Obviously my solution isn't a good one. It takes time every time a new circular link condition pops up. It's also a ticking bomb since I have no confidence that I've really found them all.

This seems like a problem that should have a known comp-sci solution. Does anyone know it? Links to references are fine.

Thanks,
-sam

Replies are listed 'Best First'.
Re: In search of an algorithm for loading cyclic graphs
by kvale (Monsignor) on May 17, 2005 at 18:50 UTC
    One common algorithm for converting a directed graph into an ordered set (i.e., your sequential IDs) is a topological sort. The problem here is that you have cycles, which generate ambiguous sorts. Thus you cannot accomplish your job as specified.

    Your 'local fix' is to break the cycle by supplying a point outside the set of sequenial IDs. If this is kosher, then I would attempt to solve your problem in three steps.

    1. First, do a breadth-first search of the graph starting from the root node. When you encounter a cycle (a node previously searched), delete that link from the graph and save the link for later use. After all nodes have been searched, you are left with an acyclic graph, or tree.
    2. Second, topologically sort this tree into a sequential set of IDs.
    3. Finally, add back in the deleted links you stored from the first step.
    Update: fixed a couple of typos.

    -Mark

      Thanks, this makes sense. I'm concerned about how this will perform on large graphs but perhaps the only way to find out is to try it. The advantage of my current approach is that it only makes one pass over the data and it's hard to beat that for speed...

      -sam

Re: In search of an algorithm for loading cyclic graphs
by dragonchild (Archbishop) on May 17, 2005 at 18:26 UTC
    • I have a cyclic directed graph.

      [snip]

    • My job is to load the graph from disk into a relational database where links work via sequentially allocated IDs.

    If you have A -> B -> A, what do you expect your table to look like? I don't think that using sequential IDs will work that well ...

    If, however, you are using three tables (One for nodes, one for linktypes, and one for node/node/link), then this is trivial:

    • Load all your nodes
    • Load all your linktypes
    • Load all your links

    But, if it was that easy, you wouldn't be asking the question. *shrugs*


    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
      If you have A -> B -> A, what do you expect your table to look like? I don't think that using sequential IDs will work that well ...

      Table A a_id | b_id ----------- 1 1 Table B b_id | a_id ----------- 1 1

      Each table has its own pool of sequential IDs, like most databases.

      If, however, you are using three tables (One for nodes, one for linktypes, and one for node/node/link), then this is trivial:

      Nope, it's not that simple. Each node type has its own table structure and most of them are more complicated than that.

      -sam

        I'm not understanding something because that looks like you're creating a table for each node type and the rows of said tables are the edges.

        Personally, I would create the tables for the node types and have a completely separate table that actually demonstrates the links between the nodes. I would also treat the nodes both in one tables (called nodes) and in their separate tables who FK to nodes. Kinda like a base class and child classes. That way, you can handle it as a 3-table graph when convenient, but you can also get to the data when needed.


        • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
        • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
        I'm not sure where the problem is here. If you know that the ids are unique within each table, and you know the table to be looking in for a given id (from the field name), doesn't that uniquely identify what each field is trying to point to?

        If not, why not just assign each table a starting number, with room for a few million (or however many seems appropriate) rows between each starting number?

        What kind of 'indigestion' does your code get?

Re: In search of an algorithm for loading cyclic graphs
by lupey (Monk) on May 17, 2005 at 18:58 UTC
    I'm not sure if this is what you need but have you looked at the CPAN module Graph?

    http://search.cpan.org/~jhi/Graph-0.65/lib/Graph.pod

    lupey

      I second that suggestion regarding Graph. From the user side, I used it recently to Order your autobundle by dependency and it had all the right parts. You may also find something interesting in the innards, although I didn't venture there myself.
Re: In search of an algorithm for loading cyclic graphs
by exussum0 (Vicar) on May 17, 2005 at 18:57 UTC
    Kruskal, Prim.. there are many spanning tree algorithms. Create a spanning tree that is directed from one point and ends at another. Any edge you add will create a cycle. Make your DB follow that.

    ----
    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.

Re: In search of an algorithm for loading cyclic graphs
by dynamo (Chaplain) on May 17, 2005 at 18:32 UTC
    A couple things about your question don't parse correctly for me.

    What do you mean when you say that you 'apply a local fix' - how is what you describe a 'fix' for anything? (You still end up with the cycles in the database, which I gather you want to do)
    Why would you be creating the id for A 'early' when you would normally generate the ids in order?
    How does this differ from not applying said fix?

      What do you mean when you say that you 'apply a local fix' - how is what you describe a 'fix' for anything? (You still end up with the cycles in the database, which I gather you want to do)

      I mean I fix the code so it doesn't die() or loop forever anymore. A typical break is either an error about not being about the find the object linked to, or worse, an infinite loop loading the same nodes over and over.

      The cycles are perfectly valid, so I'm not trying to weed them out.

      Why would you be creating the id for A 'early' when you would normally generate the ids in order?

      Here's the "normal" course of events in my system:

      • Create an object, unsaved without an ID.
      • Fill it with data. Some of the data may include the IDs of linked objects.
      • Save the object, creating an ID for it.

      That works great until the code finds a cycle. When I run into a cycle I fix it to:

      • Create an object, unsaved without an ID.
      • Fill it with data, but not the data that might cycle.
      • Save the partially filled object, creating an ID for it.
      • Fill in the possibly cyclic pieces.
      • Save the completed object.

      The problem is that doing this for each node type when I find a break is a real pain.

      -sam