Finding an optimal solution may not be possible (without searching the complete space of 2^100 possibilities) but you might try something like this:
1) For each column count the number of different values in it (easily done with hashes)
2) Sort that column list lowest number first
3) Set a key of all columns
3) Foreach column on that list try to remove it from the key and see if two lines have the same key now. if yes, re-add the column to your key, otherwise leave it off
This will give you exactly one solution. If you want to look for better solutions, you might rerun the algorithm, but scramble the sorted list of columns somewhat to get at different solutions. You could even use a totally random column list, but most likely you will get too many suboptimal solutions presented that way
In reply to Re: Finding Keys from data
by jethro
in thread Finding Keys from data
by aartist
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |