in reply to Re: Finding Keys from data
in thread Finding Keys from data

Why 10^257? A column is either in or not in the key. That would mean 2^100 possible solutions to try

Or are you refering only to the worst case of your algorithm? Seems using permutation/variation might lead to much overhead in the search. If the combination 2:4 were already tested for example, testing 4:2 is not necessary anymore

Replies are listed 'Best First'.
Re^3: Finding Keys from data
by BrowserUk (Patriarch) on Mar 31, 2011 at 17:53 UTC
    Why 10^257? A column is either in or not in the key. That would mean 2^100 possible solutions to try

    For a unique key 'ab' is different to 'ba'.

    So, it should be 100! + 99! + 98! + 97! ... so I rounded up 10^157.

    To be honest, given the scale of the numbers, it makes little difference.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I believe jethro is thinking about the problem actually proposed while BrowserUk is talking about a different problem. Of course, the root node is not completely clear on this point, so it could be that BrowserUk is correct and several others who have replied are the ones who are wrong.

      BrowserUk is picking a (potentially) different set of columns for each row while I believe a single list of columns is desired such that this same list of columns (in the same order) provide a unique key for each row.

      With a single list of columns and a useful delimiter between values, then the order no longer matters and only 2**100 potential solutions exist. Of course, you can also envision data where there is no completely effective delimiter (unless you also provide for escaping parts of values when the key is constructed). I choose to assume either an easy delimiter choice or building the key with potential escaping such as constructing the key using standard CSV format.

      - tye