Re: Re: Algorithm concerns

Shortly after I posted, I had the same inclination.. normalizing, and sorting.

I don't have a great deal of experience with normalizing data - would it be correct to say that the normalized data sets would have to have a similar distribution in order for the sum of the two values to represent a maximum?

Or perhaps if they did not have similar distributions (For example, if height were one attribute and number of teeth were the other.. unless we're dealing with a hockey team height is going to have a much broader distribution of values), the normalized scale could somehow reflect that fact?

Any suggestions on handling that task?

Thanks,

Terwin

Comment on Re: Re: Algorithm concerns

Replies are listed 'Best First'.
Re: Re: Re: Algorithm concerns by ferrency (Deacon) on May 03, 2002 at 16:20 UTC
I'm no statistician, but my gut reaction is that you're probably comparing apples and oranges, so any comparison you come up with is not likely to be very useful. As you suggest, data distribution will play a large factor here. In my earlier post, I meant to use the word "normalization" in a more generic sense: instead of simply scaling the numbers so they fit into a predictable range of values, you need to convert all values to "universal units" which can be compared directly. Depending on your attributes, this could be quite complex: comparing a weight versus a height doesn't necessarily make sense, as you observed. Those "universal units" might not be inches or pounds; they might be "deviation from average" or something else which is unit-independant. But the problem will still be easier if you can separate this data conversion step from the other parts of the process. As far as getting "the right answer," I think you'd have better luck asking someone with a high level of Statistics skill. Alan.	[reply]

Replies are listed 'Best First'.

Re: Re: Re: Algorithm concerns
by ferrency (Deacon) on May 03, 2002 at 16:20 UTC

In my earlier post, I meant to use the word "normalization" in a more generic sense: instead of simply scaling the numbers so they fit into a predictable range of values, you need to convert all values to "universal units" which can be compared directly. Depending on your attributes, this could be quite complex: comparing a weight versus a height doesn't necessarily make sense, as you observed. Those "universal units" might not be inches or pounds; they might be "deviation from average" or something else which is unit-independant.

But the problem will still be easier if you can separate this data conversion step from the other parts of the process.

As far as getting "the right answer," I think you'd have better luck asking someone with a high level of Statistics skill.

Alan.

[reply]