comment on

In fact, I do have a bit of experience putting graphs in databases. (I'm currently working on such a project, in fact.)

I don't like the idea of a column having a list of neighbors, though, because it requires you to effectively "jump through hoops" to actually use the data. As you mention, you'll be using split to pull it apart. Similarly, you don't want to have a set of columns to hold your edges, because it limits you to a certain number of edges, and also makes your queries much uglier. In either case, if you actually wanted to write an SQL query to find connect nodes through edges, you'll be fighting the database every step of the way.

There's a much better way: A common idiom in databases is to use a separate table to represent a many-to-many relationship¹. Sure, it may take a bit more space in your database, but it's very flexible, and you can do many more queries using it.

For example: Suppose you want to store a directed graph in your database². You'd use a table to hold your node information, like so³:

create table nodes (
    node_id int primary key,
    name varchar(32)
    -- other node info goes here
)
[download]

Next, you create a "many-to-many" table to describe your edges:

create table edges (
    from_node_id int not null references nodes(node_id),
    to_node_id int not null   references nodes(node_id),
    description varchar(32)
    -- other edge info goes here
)
[download]

Now you can create a graph that's easy to query using normal SQL, so you don't have to slurp all the data to your local box and then reprocess it. Quickie example:

-- Allowed states for "Daily Work Task"
insert graph_nodes values (1, 'Wait/Idle')
insert graph_nodes values (2, 'Ready')
insert graph_nodes values (3, 'Running')
insert graph_nodes values (4, 'Completed')
insert graph_nodes values (5, 'Fault')
insert graph_nodes values (6, 'Disabled')

-- Normal execution path
insert graph_edges values (1, 2, 'All prerequisites complete')
insert graph_edges values (2, 3, 'User makes request')
insert graph_edges values (3, 4, 'Job completes successfully')
insert graph_edges values (4, 1, 'NewDay resets us at midnight')

-- Abnormal paths
insert graph_edges values (3, 5, 'Job faults')
insert graph_edges values (5, 6, 'Manual intervention required')
insert graph_edges values (6, 1, 'System restarted')
insert graph_edges values (5, 2, 'Fault resolved normally')
insert graph_edges values (1, 6, 'System halted')
insert graph_edges values (2, 6, 'System halted')
[download]

Now you can do some simple queries, letting the database do the work for you:

-- Where can I go from node 5?
select  FR.node_id, '-->', TO.node_id, 'When: ' + E.description
from    edges E
join    nodes FR on FR.node_id=E.from_node_id
join    nodes TO on TO.node_id=E.to_node_id
where   E.from_node_id = 5
[download]

which should yield (untested!):

node_id     node_id
------- --- ------- ----------------------------------
      5 -->       2 When: Fault resolved normally
      5 -->       6 When: Manual intervention required
[download]

Notes:

1: Sure, each edge is a 1-to-1 connection, but from the node's perspective, each node may have 'many' input arcs and 'many' output arcs. So each row of the many-to-many table is a natural description of an arc.

2: If you want a plain (i.e. undirected) graph, then you have two choices: (a) You could rename from_node_id and to_node_id to node_1_id and node_2_id; and modify your queries to take into account that either node ID could be the answer to your query. It's a little more work in your SQL, but not terribly much; or (b) simply add two edges, one from node 1 to node 2, and one from node 2 to node 1. This simplifies your queries at the expense of more database space. (This is what I normally do, as the disk space is easier for me to spend than debugging time....)

3: I'm most familiar with Sybase/MS SQL Server T-SQL syntax, so that's what I'm writing in. You shouldn't have any trouble adapting to Oracle, mySQL, or database of your own choosing.

So if you go to the effort of putting the graph into a database, make sure you take advantage of as many database features as you can to simplify the job--Don't just use it as a "super-big virtual hash".

...roboticus

In reply to Storing a graph in a database by roboticus
in thread storing huge graph by spx2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.