Re: Algorithm concerns

It might be easier if you broke your problem into several simpler parts, and then to combine the parts. For example, you could break down your problem and generalize it like this:

Normalize the values of each attribute into a common set of units.
Combine the values of all of one person's atributes to get a total for each group member.
Sort the members of the group by this attribute total.
Choose the top X members of the group.

Algorighmically, the only thing here that isn't O(N) is the sorting. That's not so bad.

On normalization: It really depends on the nature of your attributes, as to what kind of normalization makes sense. One easy way would be to arbitrarily assign the maximum value of any one attribute the value "1", and scale the rest of the values of that attribute from 0 to 1. But again, this probably wouldn't make sense in all cases- certainly not with heights and weights.

As for the perlish aspects: I'd recommend storing each person as a hash, with each attribute name and value as a key-value pair in that hash. That way you can add more attributes, or data about the attributes (normalized values, sums, etc) into each person without adding more and more arrays into your code. Other than that, once you decide on a solution post some code and I'm sure someone will be happy to help :)

Alan

Comment on Re: Algorithm concerns

Replies are listed 'Best First'.
Re: Re: Algorithm concerns by rinceWind (Monsignor) on May 03, 2002 at 14:22 UTC
As a useful tip to doing the first of these (normalise units), take a look at Data::Dimensions	[reply]
Re: Re: Algorithm concerns by Anonymous Monk on May 03, 2002 at 16:00 UTC
Shortly after I posted, I had the same inclination.. normalizing, and sorting. I don't have a great deal of experience with normalizing data - would it be correct to say that the normalized data sets would have to have a similar distribution in order for the sum of the two values to represent a maximum? Or perhaps if they did not have similar distributions (For example, if height were one attribute and number of teeth were the other.. unless we're dealing with a hockey team height is going to have a much broader distribution of values), the normalized scale could somehow reflect that fact? Any suggestions on handling that task? Thanks, Terwin	[reply]
Re: Re: Re: Algorithm concerns by ferrency (Deacon) on May 03, 2002 at 16:20 UTC
I'm no statistician, but my gut reaction is that you're probably comparing apples and oranges, so any comparison you come up with is not likely to be very useful. As you suggest, data distribution will play a large factor here. In my earlier post, I meant to use the word "normalization" in a more generic sense: instead of simply scaling the numbers so they fit into a predictable range of values, you need to convert all values to "universal units" which can be compared directly. Depending on your attributes, this could be quite complex: comparing a weight versus a height doesn't necessarily make sense, as you observed. Those "universal units" might not be inches or pounds; they might be "deviation from average" or something else which is unit-independant. But the problem will still be easier if you can separate this data conversion step from the other parts of the process. As far as getting "the right answer," I think you'd have better luck asking someone with a high level of Statistics skill. Alan.	[reply]