Re: Grouping numbers

As bobf points out, your problem is very much like a statistical clustering task, where the object is to figure out how data is or shoud be grouped. If finding statistical clusters is your underlying task, and you want to find something close to an optimal clustering, then the Algorithm::Cluster is a good bet. It will, however, involve installing the underlying C software, and investing some time learning about clustering.

On the other hand, you may just want to do what you said, and are satisfied with any grouping that satisfies the grouping criterion, even if it's a poor grouping. If any clustering is good enough, then you can just go through the sorted list computing an average (or a running average). Each time you look at a new point, check if the new average with that point included puts the first point in the group outside the 'X' distance from the average, or puts the new point you're going to add outside the distance. If the new point expands the group too much, then all points up to that one form a group, and the new point is the first of the next group.

my @refsOfGroups = (); 
my @group;
my $avg;
my $point;

while ( @pointValues) {
   $point = shift @pointValues;
   $avg = average(@group,$point);
   if (   ($avg   - $group[0] > $x_threshold)
        ||($point - $avg      > $x_threshold)
      ) { 
      push @refsOfGroups, [@group];
      @group = ()
   }
   push @group $point; # include the last group
} # while

push @refsOfGroups, [@group];
[download]

(The code is untested, but it's pretty simple.)

Note that if you have a lot of points and a lot of big groups, the computation of average() can switche to a running average, where you don't add up all the points each time, you just add in the new value. (This also helps prevent overflow when the sums in the average get too big). Also note that you don't have to take the absolute value of the difference from $avg in the if(), because you know $avg at least as big as the first, and no larger then the one you just included in the average.

I hope this is what you need, or that it's good enough, because this is simple to do, and statistical clustering isn't, even if someone else has written the routines. Good luck with your clustering.

Comment on Re: Grouping numbers Download Code