Dear Fellow Brother Monks,

Given a list of ordered (sorted ascending) number : @nlist = (0,1,2,3,4,5,6,8,10);
List of candidate hash key: my @key_list = ('A'..'Z');
And a tolerance value: my $tolerance = 1;


Update: The array @nlist may come with duplicate elements.
For example @nlist = (0,0,1,2,3,3,4,5,6,8,8,10);

I would like to cluster the numbers in @nlist into a hash. Here each hash will contain elements within the tolerance (+/- 1) and allowing recurring (overlapping) elements. This is provided they are within the tolerance value. So finally I would have something like this:
my $VAR1 = { 'A' => [0,1,2], # centroid is 1, with other elem 0 and 2 ( +1 +/- 1) 'B' => [1,2,3], 'C' => [2,3,4], 'D' => [3,4,5], 'E' => [4,5,6], 'F' => [5,6], # centroid is 6 but only with one elem 5 (i +.e. 6-1) 'G' => [8], # next centroid is 8 but no members 'I' => [10], # same for centroid 10 }; # As the @nlist array grow larger the final hash will also grow larg +er.
So in principle every element in the array will be a centroid (except the first one, in this case it is 0).Then it finds its members within the prespecified tolerance value.

Update2:
I should add that when the very first element doesn't have its neighbour then it forms another cluster. In other words we ignore the first element only when it has neighbour within tolerance (see example below).

I am currently stuck with my code below. I really am not sure how to go about it.
use strict; use Data::Dumper; use Carp; # Size of this array could be much greater than this my @nlist = ( 0, 1, 2, 3, 4, 5, 6, 8, 10 ); # And the first element can be greater than 0 # This is a pre-generated key candidate. # It may not be used up all of them # In practice I will create a large key list, # that should be greater than potential hash to be created my @key_list = ( 'A' .. 'Z' ); my $tolerance = 1; my $hoa; foreach my $nlist (@nlist) { my @tmpar = ($nlist[0]); my $first_elem = $nlist[0]; my $klist; if ( check_member( \@tmpar, $first_elem, $nlist, $tolerance ) == 1 + ) { push @tmpar, $nlist; $klist = shift @key_list; push @{ $hoa->{$klist} }, @tmpar; } } print Dumper $hoa; # -- Subroutine ------ sub check_member { # To check if a value can be # a member of an array my ( $alist, $fel, $snum,$tol ) = @_; my $centroid = $alist->[0]; if ( $centroid - $tol == $snum or $centroid + $tol == $snum or $centroid == $snum and $centroid != $fel ) { return 1; } return 0; }

Regards,
Edward

In reply to Clustering Numbers with Overlapping Members by monkfan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.