As bobf points out, your problem is very much like a statistical clustering task, where the object is to figure out how data is or shoud be grouped. If finding statistical clusters is your underlying task, and you want to find something close to an optimal clustering, then the Algorithm::Cluster is a good bet. It will, however, involve installing the underlying C software, and investing some time learning about clustering.

On the other hand, you may just want to do what you said, and are satisfied with any grouping that satisfies the grouping criterion, even if it's a poor grouping. If any clustering is good enough, then you can just go through the sorted list computing an average (or a running average). Each time you look at a new point, check if the new average with that point included puts the first point in the group outside the 'X' distance from the average, or puts the new point you're going to add outside the distance. If the new point expands the group too much, then all points up to that one form a group, and the new point is the first of the next group.

my @refsOfGroups = (); my @group; my $avg; my $point; while ( @pointValues) { $point = shift @pointValues; $avg = average(@group,$point); if ( ($avg - $group[0] > $x_threshold) ||($point - $avg > $x_threshold) ) { push @refsOfGroups, [@group]; @group = () } push @group $point; # include the last group } # while push @refsOfGroups, [@group];
(The code is untested, but it's pretty simple.)

Note that if you have a lot of points and a lot of big groups, the computation of average() can switche to a running average, where you don't add up all the points each time, you just add in the new value. (This also helps prevent overflow when the sums in the average get too big). Also note that you don't have to take the absolute value of the difference from $avg in the if(), because you know $avg at least as big as the first, and no larger then the one you just included in the average.

I hope this is what you need, or that it's good enough, because this is simple to do, and statistical clustering isn't, even if someone else has written the routines. Good luck with your clustering.


In reply to Re: Grouping numbers by rodion
in thread Grouping numbers by tamaguchi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.