boulevard has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
The following table is from a database:
Col1 Col2 Al Ibrahim Al Bob Al Chuck Al Frank Bob Gary Bob Doug Chuck Doug Chuck Ed Chuck Frank Doug Hamza Gary Ibrahim Gary Hamza This is the above representation of all connections/links/friendship o +f each friend: Al--------Frank /|\ | / | \ | / | \ | / | \ | / | \ | Ibrahim Bob Chuck \ /| / | \ / | / | \ / | / | Gary Doug Ed \ / \ / \ / Hamza

I use SELECT (through recursion) to figure out a friend's friends and their friends, it works fine. I am wondering if there is a way (without using recursion-as it causes too much overhead) to be able to figure out who are, say, "AL" 's friends and inturn their friends. In other words, I need to determine how to find a friend's all the friends and their friends(friend of a friend of a friend and so on) without using recursion. I also did some search on sql and "adjacency list" but didn't find much helpful.

Any help is highly appreciated.
Thanks

Replies are listed 'Best First'.
Re: sql adjacency list
by perl_lover (Chaplain) on Feb 19, 2005 at 10:09 UTC
    Hi boulevard,
    You didnt mention which database you are using. ORACLE support hierarchical relationship with CONNECT BY and PRIOR. But mySQL doesnt support this. You need to write recursion for this. Consider the following table to check how to use connect by.
    create table test_connect_by ( parent number, child number, constraint uq_tcb unique (child) );
    insert into test_connect_by values ( 5, 2); insert into test_connect_by values ( 5, 3); insert into test_connect_by values (18,11); insert into test_connect_by values (18, 7); insert into test_connect_by values (17, 9); insert into test_connect_by values (17, 8); insert into test_connect_by values (26,13); insert into test_connect_by values (26, 1); insert into test_connect_by values (26,12); insert into test_connect_by values (15,10); insert into test_connect_by values (15, 5); insert into test_connect_by values (38,15); insert into test_connect_by values (38,17); insert into test_connect_by values (38, 6); insert into test_connect_by values (null,38); insert into test_connect_by values (null,26); insert into test_connect_by values (null,16);
    select lpad(' ',2*(level-1)) || to_char(child) s from test_connect_by start with parent is null connect by prior child = parent;
    This query will yield result like this
    38 15 10 5 2 3 17 9 8 6 26 13 1 12 16


    - perl_lover
      its not working in sql 2005
Re: sql adjacency list
by dbwiz (Curate) on Feb 19, 2005 at 12:29 UTC
Re: sql adjacency list
by Aristotle (Chancellor) on Feb 19, 2005 at 14:05 UTC

    You will have to run one query per degree of separation.

    I'll assume the following schema (SQLite dialect):

    CREATE TABLE person ( id INTEGER NOT NULL, name TEXT, PRIMARY KEY ( id ) ); CREATE TABLE friend ( person INTEGER NOT NULL, friend INTEGER NOT NULL, PRIMARY KEY ( person, friend ) );

    Then your graph would be

    With this you can figure out a person's friends by joining the tables:

    sqlite> SELECT P1.name, P2.name ...> FROM person P1 ...> JOIN friend F ON P1.id = F.person ...> JOIN person P2 ON F.friend = P2.id ...> WHERE P1.name = 'Bob' ; P1.name|P2.name Bob|Al Bob|Doug Bob|Gary

    For the friends' friends you use a subselect:

    sqlite> SELECT P1.name, P2.name ...> FROM person P1 ...> JOIN friend F ON P1.id = F.person ...> JOIN person P2 ON F.friend = P2.id ...> WHERE P1.id IN ( ...> SELECT F.friend ...> FROM person P ...> JOIN friend F ON P.id = F.person ...> WHERE P.name = 'Bob' ...> ) ; P1.name|P2.name Al|Bob Al|Chuck Al|Frank Al|Ibrahim Doug|Bob Doug|Chuck Doug|Hamza Gary|Bob Gary|Hamza Gary|Ibrahim

    And so on and so forth, analogously:

    sqlite> SELECT P1.name, P2.name ...> FROM person P1 ...> JOIN friend F ON P1.id = F.person ...> JOIN person P2 ON F.friend = P2.id ...> WHERE P1.id IN ( ...> SELECT F.friend ...> FROM person P ...> JOIN friend F ON P.id = F.person ...> WHERE P.id IN ( ...> SELECT F.friend ...> FROM person P ...> JOIN friend F ON P.id = F.person ...> WHERE P.name = 'Bob' ...> ) ...> ); Bob|Al Bob|Doug Bob|Gary Chuck|Al Chuck|Doug Chuck|Ed Chuck|Frank Frank|Al Frank|Chuck Hamza|Doug Hamza|Gary Ibrahim|Al Ibrahim|Gary

    I used SELECT P1.name, P2.name at the top level SELECT for demonstration purposes. In practice you will want SELECT DISTINCT P2.name.

    Makeshifts last the longest.

Re: sql adjacency list
by dimar (Curate) on Feb 19, 2005 at 14:09 UTC

    boulevard,

    Since I specifically remember talking to you earlier about this scenario, here are some additional links that I think you will find useful.

     4Ways(10/10)  StoringHierarchy(10/10)  TreeAlternatives(10/10)  Biology(6/10)

    NOTE: The numbers after the links are my own 'relevance ratings' based on the similarity of the content to your current project and background (i.e., you already have something that "works", and now you just want alternatives)(i.e., you are looking for reader-friendly overview materials on this subject). If this is a fair characterization of your situation, then these are definitely for you.

    NOTE: The links provided here could also be found further down in the articles and nodes in dbwiz's excellent answer.

    NOTE: If you are already using Oracle (which I just blankly assumed you weren't) then these links are probably not as useful, but still good reading in case you're interested. Try the Oracle-specific approach mentioned elsewhere in this thread.

    HTH keywords: adjacency list; lookup table; graph; tree; hierarchical data; relational modeling; structure; org chart; data modeling; DAG; GRAPH; node; edge; algorithms; recursive; cyclic; Oracle; oracle connect by; oracle prior; cfRDBTree;

    update: Aristotle's answer also matches what you were looking for, he's faster with the "create article" button than scooterm :)

Re: sql adjacency list
by jdalbec (Deacon) on Feb 19, 2005 at 15:39 UTC
    If I understand correctly, you're looking for Al's component in the friendship graph, which (assuming friendship is symmetric) is everyone listed in your example.

    Something like this should work, except you'll need to use code specific to your database where I've written the select statements inline.

    @new = ('Al'); while(@new){ $key = shift @new; push @old, $key; @friends = (select col1 from table where col2 = $key, select col2 from table where col1 = $key); %unique = (); @unique{@friends} = undef; delete @unique{@new, @old}; push @new, keys %unique; }
    At this point @old contains the members of Al's component. The code does a breadth-first search. If you prefer depth-first search just change the final push to unshift.
Re: sql adjacency list
by dragonchild (Archbishop) on Feb 19, 2005 at 22:37 UTC
    Aristotle's answer is bang-on, for a SQL solution. There are other ways.

    If

    • Your number of people is small
    • You plan on asking this question a ton of times
    you might want to consider loading everything into a Graph::Undirected and using the graph methods yourself. I'd look at the set of the vertices a distance of 2 from a given vertex. It's really easy to write this method and it will run in near-constant (and un-noticeable) time.

    The major improvement that this has over a SQL solution is that it will scale better the more friendships you have. If you have, say, 50 people in your database and each is friends with 20 people, you have 50 vertices and 1000 edges. If each person meets 5 people, you still have 50 people, but 1250 edges. If done poorly, the friendship table in Aristotle's solution will bog you down.

    I also have a version of Graph::Undirected* that is optimized for this type of analysis (unique edges between vertices) which is up to 4x faster than the generic Graph::Undirected on CPAN.

    *: Jarrko's aware of it.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: sql adjacency list
by TedPride (Priest) on Feb 19, 2005 at 10:37 UTC
    The best thing to do from an efficiency standpoint would be to have the entire database in memory at once, with pointers from each record to its "friend" records so access time is at a minimum. Barring this, a flatfile database with line numbers will work fine, as long as you sort in order the lines you need to access at each depth of recursion so seek times are at a minimum. A secondary file should be used for finding the line number corresponding to a name (necessary for starting things off by locating the first record) so the main file can reuse lines and not worry about staying in order.