spx2:

In fact, I do have a bit of experience putting graphs in databases. (I'm currently working on such a project, in fact.)

I don't like the idea of a column having a list of neighbors, though, because it requires you to effectively "jump through hoops" to actually use the data. As you mention, you'll be using split to pull it apart. Similarly, you don't want to have a set of columns to hold your edges, because it limits you to a certain number of edges, and also makes your queries much uglier. In either case, if you actually wanted to write an SQL query to find connect nodes through edges, you'll be fighting the database every step of the way.

There's a much better way: A common idiom in databases is to use a separate table to represent a many-to-many relationship1. Sure, it may take a bit more space in your database, but it's very flexible, and you can do many more queries using it.

For example: Suppose you want to store a directed graph in your database2. You'd use a table to hold your node information, like so3:

create table nodes ( node_id int primary key, name varchar(32) -- other node info goes here )
Next, you create a "many-to-many" table to describe your edges:

create table edges ( from_node_id int not null references nodes(node_id), to_node_id int not null references nodes(node_id), description varchar(32) -- other edge info goes here )
Now you can create a graph that's easy to query using normal SQL, so you don't have to slurp all the data to your local box and then reprocess it. Quickie example:

-- Allowed states for "Daily Work Task" insert graph_nodes values (1, 'Wait/Idle') insert graph_nodes values (2, 'Ready') insert graph_nodes values (3, 'Running') insert graph_nodes values (4, 'Completed') insert graph_nodes values (5, 'Fault') insert graph_nodes values (6, 'Disabled') -- Normal execution path insert graph_edges values (1, 2, 'All prerequisites complete') insert graph_edges values (2, 3, 'User makes request') insert graph_edges values (3, 4, 'Job completes successfully') insert graph_edges values (4, 1, 'NewDay resets us at midnight') -- Abnormal paths insert graph_edges values (3, 5, 'Job faults') insert graph_edges values (5, 6, 'Manual intervention required') insert graph_edges values (6, 1, 'System restarted') insert graph_edges values (5, 2, 'Fault resolved normally') insert graph_edges values (1, 6, 'System halted') insert graph_edges values (2, 6, 'System halted')
Now you can do some simple queries, letting the database do the work for you:

-- Where can I go from node 5? select FR.node_id, '-->', TO.node_id, 'When: ' + E.description from edges E join nodes FR on FR.node_id=E.from_node_id join nodes TO on TO.node_id=E.to_node_id where E.from_node_id = 5
which should yield (untested!):

node_id node_id ------- --- ------- ---------------------------------- 5 --> 2 When: Fault resolved normally 5 --> 6 When: Manual intervention required
Notes:

1: Sure, each edge is a 1-to-1 connection, but from the node's perspective, each node may have 'many' input arcs and 'many' output arcs. So each row of the many-to-many table is a natural description of an arc.

2: If you want a plain (i.e. undirected) graph, then you have two choices: (a) You could rename from_node_id and to_node_id to node_1_id and node_2_id; and modify your queries to take into account that either node ID could be the answer to your query. It's a little more work in your SQL, but not terribly much; or (b) simply add two edges, one from node 1 to node 2, and one from node 2 to node 1. This simplifies your queries at the expense of more database space. (This is what I normally do, as the disk space is easier for me to spend than debugging time....)

3: I'm most familiar with Sybase/MS SQL Server T-SQL syntax, so that's what I'm writing in. You shouldn't have any trouble adapting to Oracle, mySQL, or database of your own choosing.

So if you go to the effort of putting the graph into a database, make sure you take advantage of as many database features as you can to simplify the job--Don't just use it as a "super-big virtual hash".

...roboticus


In reply to Storing a graph in a database by roboticus
in thread storing huge graph by spx2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.