Say C1 has average 0.8 but only 1 member (excluding centroid) , and C2 has average 0.7 with 5 members, we would weight C2 as better than C1.
The problem is, you aren't specifying how you are scoring that.
- If you go for the highest average (as I was) then 0.8*1/1 > 0.7 *5/5.
- If you go for the highest score, which makes 3.5 beat 0.8, then you will never remove anything from the set because it would reduce the (total) score.
Put another way:
- Allowing irregular subsets, the 'cluster' of 9 1.0s that form the major diagonal gives you an average of 1.0 and a size of 9. This can never be beaten, (scoring the average) as adding any lesser value than 1.0 value will decrease the average.
- But if you use the total, you'll never remove anything from the set as any removal will reduce that total.
Unless you introduce some other metric or heuristic, you don't have a scoring mechanism that reflects your stated goals?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.