in reply to Advise on sql table mapping.

Basically, you want to find the path connecting two tables. As the total number of tables is likely very finite (I suspect less than 4000, to pull a number out of my ass), and the number of joins likely also is very finite (let's say, each table links to two other tables, so roughly 8000 links) you can use brute force to enumerate all paths between two tables.

As you likely want to find a shortest connection according to some metric, you will have to enumerate them all and then select one of the shortest paths. There can be obviously more than one shortest path.

For generic and relatively efficient path finding, see the A_Star algorithm.

If you want to get ambitious, you can use AI::Prolog or just Prolog directly to let it tell you the links between two tables.

Of course, you can also program the search directly, but this is untested code I write off my cuff, so caveat emptor:

use strict; use Data::Dumper; my @links = map { [/^(\w+)\s+(\w+)/] } split /\n/, <<LINKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} ||= []; $links{$right} ||= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; }; sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited ||= {}; # our breadcrumb trail so we don't run around in +circles $prefix ||= []; # the path so far my $curr = $start; warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); }; }; # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; return @results; }; print Dumper find_joins( apple => 'bug' ); print Dumper find_joins( apple => 'dog' ); print Dumper find_joins( cat => 'dog' ); __END__ [] apple => bug at tmp.pl line 31. [apple] cat => bug at tmp.pl line 31. [apple] dog => bug at tmp.pl line 31. $VAR1 = [ 'apple', 'bug' ]; $VAR2 = [ 'apple', 'dog', 'bug' ]; [] apple => dog at tmp.pl line 31. [apple] bug => dog at tmp.pl line 31. [apple] cat => dog at tmp.pl line 31. $VAR1 = [ 'apple', 'dog' ]; $VAR2 = [ 'apple', 'bug', 'dog' ]; [] cat => dog at tmp.pl line 31. [cat] apple => dog at tmp.pl line 31. [cat apple] bug => dog at tmp.pl line 31. $VAR1 = [ 'cat', 'apple', 'dog' ]; $VAR2 = [ 'cat', 'apple', 'bug', 'dog' ];

An interesting invariant that you should maybe check in the code would be that the joins must be symmetric, that is, all paths which are found for getting from $x to $y need also be found when getting from $y to $x. I haven't checked that my brute force solution returns symmetric results.

Conversion of the output into SQL statements is left as an exercise.

Replies are listed 'Best First'.
Re^2: Advise on sql table mapping.
by dbmathis (Scribe) on May 18, 2008 at 02:50 UTC

    Ok, so I don't follow the output of:

    print Dumper find_joins( apple => 'dog' ); [] apple => dog at tmp.pl line 31. [apple] bug => dog at tmp.pl line 31. [apple] cat => dog at tmp.pl line 31. $VAR1 = [ 'apple', 'dog' ]; $VAR2 = [ 'apple', 'bug', 'dog' ];


    After all this is over, all that will really have mattered is how we treated each other.

      This is the data structure that is returned from find_joins. The first line seems to be pasted from the source code. The next three lines beginning with [...] are debugging output of find_joins as it progresses its work. See line 31 for that. It outputs the current path, the current start and the current goal. The last lines are the output of Data::Dumper which displays the data structure returned.

      Maybe you can tell me in more detail where you have problems. If it is with complex data structures, maybe perldsc or Quick References Reference help you.

        Corion, your code is nice. After looking at it for awhile it makes sense and it gives me something to start with. I should be able to easily write the the code to output the SQL.

        I will let you know if I run into problems. Thank you very much for all of the great reading material!

        After all this is over, all that will really have mattered is how we treated each other.

        Complex data structures are not a new concept to me although I must admit I have really only worked with no greater than 2 dimensions.

        I have been working with SQL (Oracle, MSQSL and MYSQL) for years which is different. It appears that in perl 3+ dimensions is something I need to familiarize myself with.

        I will read up on the links you provided.

Re^2: Advise on sql table mapping.
by dbmathis (Scribe) on May 19, 2008 at 19:32 UTC

    I have been playing with the code you posted above. It's outputting the correct data but I am not sure why i am seeing some warnings and a few blank line.

    Modified:

    #!/usr/bin/perl -w my @links = map { [/^(\w+)\s+(\w+)/] } split /\n/, <<LINKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 dog owl unid4 unid5 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} ||= []; $links{$right} ||= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; } sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited ||= {}; # our breadcrumb trail so we dont $prefix ||= []; # the path so far my $curr = $start; # warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); } } # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; # print the whole thing with indices for $i ( 0 .. $#results ) { print "@{$results[$i]}\n"; } } find_joins( 'apple' => 'owl' );
    Output:
    dbmathis@dbmathis-linux:~$ ./tmp.pl apple bug dog owl apple dog owl Use of uninitialized value in string comparison (cmp) at ./tmp.pl line + 46. Use of uninitialized value in string comparison (cmp) at ./tmp.pl line + 46. dbmathis@dbmathis-linux:~$
    After all this is over, all that will really have mattered is how we treated each other.

      The print statements will also try out to print stuff when no path can be found. You shouldn't print out stuff within find_joins, you should print out the results of the top-level call to it. Leaving the warn statement active would have helped you see the flow of control better.

      Why did you remove the use strict; line? Adding it again points to a potential error.

        I had removed strict and forgot to add it back when I was calling @results from outside the loop.

        Just basically studying your code. Thanks btw. This has really helped me greatly.

        After all this is over, all that will really have mattered is how we treated each other.
Re^2: Advise on sql table mapping.
by dbmathis (Scribe) on May 20, 2008 at 03:05 UTC

    Ok, here is my first attempt to converting your output into SQL. I have not added the table aliases into the code yet, and I am sure you can do this with far less code so let me know what you think.

    I think the function calling itself makes sense now. I assume you are doing this to populate prefix. I may be wrong though.

    #!/usr/bin/perl -w use strict; my @links = map { [/^(\w+)\s+(\w+)\s+(\w+)\s+(\w+)/] } split /\n/, <<L +INKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 dog owl unid4 unid5 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} ||= []; $links{$right} ||= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; } sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited ||= {}; # our breadcrumb trail so we dont go in circles $prefix ||= []; # the path so far my $curr = $start; #warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; for my $i ( 0 .. $#results ) { for my $ii ( 0 .. $#{$results[$i]} ) { if ( $ii == 0 ) { print "$results[$i][$ii]\n"; } else { my $keys; for ( @links ) { my ( $left, $right, $lid, $rid ) = @$_; if ( $left eq $results[$i][$ii - 1] && $ri +ght eq $results[$i][$ii] ) { $keys = "on $lid = $rid"; } } print "inner join $results[$i][$ii] $keys\n"; } } print "\n"; } } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); } } # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; } find_joins( 'apple' => 'owl' );

    apple inner join bug on unid1 = unid2 inner join dog on unid2 = unid4 inner join owl on unid4 = unid5 apple inner join dog on unid1 = unid4 inner join owl on unid4 = unid5
    After all this is over, all that will really have mattered is how we treated each other.

      Exactly - $prefix contains the path taken so far, that's what the recursion does.

      Again, why are you trying to print out the stuff from within find_joins? Please, don't do that. Just don't. Take the results of find_joins and then, convert them to the output format. It's much easier that way instead of trying to make a recursive function output the stuff you want at the end.

      Your way of iterating over arrays is cumbersome, as you never need the array index anywhere:

      for my $i ( 0 .. $#results ) {

      is easier written as

      for my $element ( @results ) {

      But other than that, your code is close once you just take the data structure returned by find_joins and work on that:

      # Build a lookup table for "left_table;right_table" => [ join columns +] my %link_columns = map { $_->[0] . ";" . $_->[1] => [ $_->[2], $_->[3] ], $_->[1] . ";" . $_->[0] => [ $_->[3], $_->[2] ] } @links; my @results = find_joins( 'apple' => 'owl' ); die "No path from 'apple' to 'owl'" unless @results; my $shortest_path = $results[0]; my $start = shift @$shortest_path; print "SELECT * FROM "; print "$start\n"; my $left = $start; for my $table (@$shortest_path) { my $key = "$left;$table"; print "-- $key\n"; my $join_cols = $link_columns{$key}; print " inner join $table on $join_cols->[0] = $join_cols->[1]\n"; $left = $table; };