in reply to Re: Clustering Numbers with Overlapping Members
in thread Clustering Numbers with Overlapping Members

Dear GP,

Are duplicates allowed in the number list and are numbers constrained to be integer?
Yes they can be duplicates. But the array is always sorted in ascending order. So for example given @nlist = (0,0,1,2,3,3,4,5,6,8,8,10);. We would like to have such cluster:
# I have manually align this based on the centroid Centroid | my $VAR1 = { #V 'A' => [0,0,1], 'B' => [0,0,1,2], 'C' => [1,2,3,3], 'D' => [2,3,3,4], 'E' => [3,3,4,5], 'F' => [4,5,6], 'G' => [5,6] 'H' => [8,8], 'I' => [10], };
I'm sorry for not being clear in the first place.
What happens to hash keys if there are more than 26 keys?
As stated in my snippet. I have made sure that the cluster won't need more than 26 keys.

Regards,
Edward

Replies are listed 'Best First'.
Re^3: Clustering Numbers with Overlapping Members
by Hofmator (Curate) on Aug 07, 2006 at 14:01 UTC
    OK, if you want this behaviour, then the following code - slightly modified from my previous posting - should work.
    use warnings; use strict; my @nlist = (0,0,1,2,3,3,4,5,6,8,8,10); my @key_list = ('A'..'Z'); my $tolerance = 1; my %hoa; my %uniq; @uniq{@nlist[1..$#nlist]} = (); for my $centroid (sort {$a <=> $b} keys %uniq) { my $key = shift @key_list; $hoa{$key} = [grep in_range($centroid, $_), @nlist ]; } print "$_ => [@{$hoa{$_}}]\n" for sort keys %hoa; sub in_range { my ($centroid, $testnum) = @_; return abs($centroid - $testnum) <= $tolerance; }
    The idea is to iterate over the (unique) centroids - ignoring the very first element in @nlist - and extract all numbers 'in_range' from the original array.

    Update: Small bugfix.

    -- Hofmator

      Hofmator,

      One more small thing. Hope you won't mind to look at it. I should add that when the very first element doesn't have its neighbour then it forms another cluster. In other words we ignore the first element only when it has neighbour within tolerance. So for example:
      my @nlist = ( 2,4,5,6,7 ); my $tolerance = 1; We would like to have: A => [2] B => [4 5] C => [4 5 6] D => [5 6 7] E => [6 7]
      How can I modify your code to accomodate this?

      Update: I think I got it.
      #my @nlist = (0,0,1,2,3,3,4,5,6,8,8,10); #my @nlist = (0,1,2,3,4,5,6,8,10); my @nlist = (2,4,5,6,7); my @key_list = ('A'..'Z'); my $tolerance = 1; my %hoa; my %uniq; # Check if first element has a neighbour if ( felem_has_nbr( $nlist[0], $nlist[1],$tolerance ) == 1 ) { @uniq{ @nlist[ 1 .. $#nlist ] } = (); } else { @uniq{ @nlist[ 0 .. $#nlist ] } = (); } for my $centroid ( sort { $a <=> $b } keys %uniq ) { my $key = shift @key_list; $hoa{$key} = [ grep in_range( $centroid, $_ ), @nlist ]; } print "$_ => [@{$hoa{$_}}]\n" for sort keys %hoa; sub in_range { my ( $centroid, $testnum ) = @_; return abs( $centroid - $testnum ) <= $tolerance; } sub felem_has_nbr { my ( $felem, $sec_in_arr, $tl ) = @_; abs( $felem - $sec_in_arr ) <= $tl ? return 1 : return 0; }

      Regards,
      Edward