Advise on sql table mapping.

dbmathis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Advise on sql table mapping. by Corion (Patriarch) on May 17, 2008 at 21:53 UTC
Basically, you want to find the path connecting two tables. As the total number of tables is likely very finite (I suspect less than 4000, to pull a number out of my ass), and the number of joins likely also is very finite (let's say, each table links to two other tables, so roughly 8000 links) you can use brute force to enumerate all paths between two tables. As you likely want to find a shortest connection according to some metric, you will have to enumerate them all and then select one of the shortest paths. There can be obviously more than one shortest path. For generic and relatively efficient path finding, see the A_Star algorithm. If you want to get ambitious, you can use AI::Prolog or just Prolog directly to let it tell you the links between two tables. Of course, you can also program the search directly, but this is untested code I write off my cuff, so caveat emptor: use strict; use Data::Dumper; my @links = map { [/^(\w+)\s+(\w+)/] } split /\n/, <<LINKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} \|\|= []; $links{$right} \|\|= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; }; sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited \|\|= {}; # our breadcrumb trail so we don't run around in +circles $prefix \|\|= []; # the path so far my $curr = $start; warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); }; }; # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; return @results; }; print Dumper find_joins( apple => 'bug' ); print Dumper find_joins( apple => 'dog' ); print Dumper find_joins( cat => 'dog' ); __END__ [] apple => bug at tmp.pl line 31. [apple] cat => bug at tmp.pl line 31. [apple] dog => bug at tmp.pl line 31. $VAR1 = [ 'apple', 'bug' ]; $VAR2 = [ 'apple', 'dog', 'bug' ]; [] apple => dog at tmp.pl line 31. [apple] bug => dog at tmp.pl line 31. [apple] cat => dog at tmp.pl line 31. $VAR1 = [ 'apple', 'dog' ]; $VAR2 = [ 'apple', 'bug', 'dog' ]; [] cat => dog at tmp.pl line 31. [cat] apple => dog at tmp.pl line 31. [cat apple] bug => dog at tmp.pl line 31. $VAR1 = [ 'cat', 'apple', 'dog' ]; $VAR2 = [ 'cat', 'apple', 'bug', 'dog' ]; [download] An interesting invariant that you should maybe check in the code would be that the joins must be symmetric, that is, all paths which are found for getting from `$x` to `$y` need also be found when getting from `$y` to `$x`. I haven't checked that my brute force solution returns symmetric results. Conversion of the output into SQL statements is left as an exercise.	[reply] [d/l] [select]
Re^2: Advise on sql table mapping. by dbmathis (Scribe) on May 18, 2008 at 02:50 UTC
Ok, so I don't follow the output of: `print Dumper find_joins( apple => 'dog' ); [] apple => dog at tmp.pl line 31. [apple] bug => dog at tmp.pl line 31. [apple] cat => dog at tmp.pl line 31. $VAR1 = [ 'apple', 'dog' ]; $VAR2 = [ 'apple', 'bug', 'dog' ];` [download] After all this is over, all that will really have mattered is how we treated each other.	[reply] [d/l]
Re^3: Advise on sql table mapping. by Corion (Patriarch) on May 18, 2008 at 07:49 UTC
This is the data structure that is returned from `find_joins`. The first line seems to be pasted from the source code. The next three lines beginning with `[...]` are debugging output of `find_joins` as it progresses its work. See line 31 for that. It outputs the current path, the current start and the current goal. The last lines are the output of Data::Dumper which displays the data structure returned. Maybe you can tell me in more detail where you have problems. If it is with complex data structures, maybe perldsc or Quick References Reference help you.	[reply] [d/l] [select]
Re^4: Advise on sql table mapping. by dbmathis (Scribe) on May 18, 2008 at 13:41 UTC
Re^4: Advise on sql table mapping. by Anonymous Monk on May 18, 2008 at 12:49 UTC
Re^2: Advise on sql table mapping. by dbmathis (Scribe) on May 19, 2008 at 19:32 UTC
I have been playing with the code you posted above. It's outputting the correct data but I am not sure why i am seeing some warnings and a few blank line. Modified: #!/usr/bin/perl -w my @links = map { [/^(\w+)\s+(\w+)/] } split /\n/, <<LINKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 dog owl unid4 unid5 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} \|\|= []; $links{$right} \|\|= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; } sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited \|\|= {}; # our breadcrumb trail so we dont $prefix \|\|= []; # the path so far my $curr = $start; # warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); } } # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; # print the whole thing with indices for $i ( 0 .. $#results ) { print "@{$results[$i]}\n"; } } find_joins( 'apple' => 'owl' ); [download] Output: `dbmathis@dbmathis-linux:~$ ./tmp.pl apple bug dog owl apple dog owl Use of uninitialized value in string comparison (cmp) at ./tmp.pl line + 46. Use of uninitialized value in string comparison (cmp) at ./tmp.pl line + 46. dbmathis@dbmathis-linux:~$` [download] After all this is over, all that will really have mattered is how we treated each other.	[reply] [d/l] [select]
Re^3: Advise on sql table mapping. by Corion (Patriarch) on May 19, 2008 at 19:56 UTC
The `print` statements will also try out to print stuff when no path can be found. You shouldn't print out stuff within `find_joins`, you should print out the results of the top-level call to it. Leaving the `warn` statement active would have helped you see the flow of control better. Why did you remove the `use strict;` line? Adding it again points to a potential error.	[reply] [d/l] [select]
Re^4: Advise on sql table mapping. by dbmathis (Scribe) on May 19, 2008 at 20:56 UTC
Re^5: Advise on sql table mapping. by Corion (Patriarch) on May 19, 2008 at 21:02 UTC
Some notes below your chosen depth have not been shown here
Re^2: Advise on sql table mapping. by dbmathis (Scribe) on May 20, 2008 at 03:05 UTC
Ok, here is my first attempt to converting your output into SQL. I have not added the table aliases into the code yet, and I am sure you can do this with far less code so let me know what you think. I think the function calling itself makes sense now. I assume you are doing this to populate prefix. I may be wrong though. #!/usr/bin/perl -w use strict; my @links = map { [/^(\w+)\s+(\w+)\s+(\w+)\s+(\w+)/] } split /\n/, <<L +INKS; apple bug unid1 unid2 apple cat unid1 unid3 bug dog unid2 unid4 apple dog unid1 unid4 dog owl unid4 unid5 LINKS # All links work both ways: my %links; for (@links) { my ($left,$right) = @$_; # We store all links in arrays $links{$left} \|\|= []; $links{$right} \|\|= []; # Links work both ways push @{$links{$left}}, $right; push @{$links{$right}}, $left; } sub find_joins { my ($start, $goal, $prefix, $visited) = @_; $visited \|\|= {}; # our breadcrumb trail so we dont go in circles $prefix \|\|= []; # the path so far my $curr = $start; #warn "[@$prefix] $start => $goal"; my @results; # All the paths that lead from $start to $end $visited->{$start} = 1; for my $link (grep { ! $visited->{$_} } @{$links{$start}}) { if ($link eq $goal) { push @results, [@$prefix,$start,$goal]; for my $i ( 0 .. $#results ) { for my $ii ( 0 .. $#{$results[$i]} ) { if ( $ii == 0 ) { print "$results[$i][$ii]\n"; } else { my $keys; for ( @links ) { my ( $left, $right, $lid, $rid ) = @$_; if ( $left eq $results[$i][$ii - 1] && $ri +ght eq $results[$i][$ii] ) { $keys = "on $lid = $rid"; } } print "inner join $results[$i][$ii] $keys\n"; } } print "\n"; } } else { my %v = %$visited; push @results, find_joins($link,$goal,[@$prefix,$start],\% +v); } } # Sort by length, then asciibetically @results = sort { scalar @$a <=> scalar @$b or $a->[0] cmp $b->[0] + } @results; } find_joins( 'apple' => 'owl' ); [download] `apple inner join bug on unid1 = unid2 inner join dog on unid2 = unid4 inner join owl on unid4 = unid5 apple inner join dog on unid1 = unid4 inner join owl on unid4 = unid5` [download] After all this is over, all that will really have mattered is how we treated each other.	[reply] [d/l] [select]
Re^3: Advise on sql table mapping. by Corion (Patriarch) on May 20, 2008 at 06:38 UTC
Exactly - `$prefix` contains the path taken so far, that's what the recursion does. Again, why are you trying to print out the stuff from within find_joins? Please, don't do that. Just don't. Take the results of `find_joins` and then, convert them to the output format. It's much easier that way instead of trying to make a recursive function output the stuff you want at the end. Your way of iterating over arrays is cumbersome, as you never need the array index anywhere: `for my $i ( 0 .. $#results ) {` [download] is easier written as `for my $element ( @results ) {` [download] But other than that, your code is close once you just take the data structure returned by `find_joins` and work on that: # Build a lookup table for "left_table;right_table" => [ join columns +] my %link_columns = map { $_->[0] . ";" . $_->[1] => [ $_->[2], $_->[3] ], $_->[1] . ";" . $_->[0] => [ $_->[3], $_->[2] ] } @links; my @results = find_joins( 'apple' => 'owl' ); die "No path from 'apple' to 'owl'" unless @results; my $shortest_path = $results[0]; my $start = shift @$shortest_path; print "SELECT * FROM "; print "$start\n"; my $left = $start; for my $table (@$shortest_path) { my $key = "$left;$table"; print "-- $key\n"; my $join_cols = $link_columns{$key}; print " inner join $table on $join_cols->[0] = $join_cols->[1]\n"; $left = $table; }; [download]	[reply] [d/l] [select]
Re: Advise on sql table mapping. by CountZero (Bishop) on May 18, 2008 at 17:16 UTC
I think you are missing som important info in your file: the type of the join. Are they all inner equi-joins? Or are some them left or right joins? It will make a hell of a difference as you could loose a lot of relevant records by assuming it are all inner equi-joins Also as soon as left or right joins are involved the solution is not symmetrical anymore as operations with outer joins (left or right) are not commutative. I'm not sure if they are associative, but a combination of inner and outer joins is certainly not associative. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^2: Advise on sql table mapping. by dbmathis (Scribe) on May 18, 2008 at 20:02 UTC
At this point left / right joins are not so important, however there will be situations where I will have "join table on a = b and b = c". I think for now I will work with getting a script functioning for basic joins and then move to something more complex. I already feel like I have jumped into the deep end without knowing how to swim :) After all this is over, all that will really have mattered is how we treated each other.	[reply]
Re: Advise on sql table mapping. by EvanCarroll (Chaplain) on May 17, 2008 at 21:55 UTC
For parsing the format you can use my DataExtract::FixedWidth to help you with this project, I should have the next version up today. `my $de = DataExtract::FixedWidth->new( cols => ['table 1', 'table 2' , 'key 1', 'key 2'], header_row => 'table 1 table 2 key 1 key 2' )` [download] The next version would be slightly easier for this format. `my $de = DataExtract::FixedWidth->new( heuristic => \@lines ); for ( @lines ) { next unless m/\w/; ## skips the definition row of ---- $de->parse_hash( $_ ) }` [download] Evan Carroll I hack for the ladies. www.EvanCarroll.com	[reply] [d/l] [select]
Re: Advise on sql table mapping. by dragonchild (Archbishop) on May 18, 2008 at 14:25 UTC
When I did something like this five years ago, I used Graph and the shortest path algorithms it provides. You can have the vertices be your tables and the edges describe the joins. That will preserve all the information you need. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]