in reply to Grouping numbers
On the other hand, you may just want to do what you said, and are satisfied with any grouping that satisfies the grouping criterion, even if it's a poor grouping. If any clustering is good enough, then you can just go through the sorted list computing an average (or a running average). Each time you look at a new point, check if the new average with that point included puts the first point in the group outside the 'X' distance from the average, or puts the new point you're going to add outside the distance. If the new point expands the group too much, then all points up to that one form a group, and the new point is the first of the next group.
(The code is untested, but it's pretty simple.)my @refsOfGroups = (); my @group; my $avg; my $point; while ( @pointValues) { $point = shift @pointValues; $avg = average(@group,$point); if ( ($avg - $group[0] > $x_threshold) ||($point - $avg > $x_threshold) ) { push @refsOfGroups, [@group]; @group = () } push @group $point; # include the last group } # while push @refsOfGroups, [@group];
Note that if you have a lot of points and a lot of big groups, the computation of average() can switche to a running average, where you don't add up all the points each time, you just add in the new value. (This also helps prevent overflow when the sums in the average get too big). Also note that you don't have to take the absolute value of the difference from $avg in the if(), because you know $avg at least as big as the first, and no larger then the one you just included in the average.
I hope this is what you need, or that it's good enough, because this is simple to do, and statistical clustering isn't, even if someone else has written the routines. Good luck with your clustering.
|
|---|