in reply to Finding Keys from data
Finding an optimal solution may not be possible (without searching the complete space of 2^100 possibilities) but you might try something like this:
1) For each column count the number of different values in it (easily done with hashes)
2) Sort that column list lowest number first
3) Set a key of all columns
3) Foreach column on that list try to remove it from the key and see if two lines have the same key now. if yes, re-add the column to your key, otherwise leave it off
This will give you exactly one solution. If you want to look for better solutions, you might rerun the algorithm, but scramble the sorted list of columns somewhat to get at different solutions. You could even use a totally random column list, but most likely you will get too many suboptimal solutions presented that way
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Finding Keys from data
by roboticus (Chancellor) on Apr 01, 2011 at 10:57 UTC | |
by jethro (Monsignor) on Apr 01, 2011 at 11:39 UTC | |
by roboticus (Chancellor) on Apr 01, 2011 at 12:12 UTC |