in reply to Re^2: Clustering Numbers with Overlapping Members
in thread Clustering Numbers with Overlapping Members

OK, if you want this behaviour, then the following code - slightly modified from my previous posting - should work.
use warnings; use strict; my @nlist = (0,0,1,2,3,3,4,5,6,8,8,10); my @key_list = ('A'..'Z'); my $tolerance = 1; my %hoa; my %uniq; @uniq{@nlist[1..$#nlist]} = (); for my $centroid (sort {$a <=> $b} keys %uniq) { my $key = shift @key_list; $hoa{$key} = [grep in_range($centroid, $_), @nlist ]; } print "$_ => [@{$hoa{$_}}]\n" for sort keys %hoa; sub in_range { my ($centroid, $testnum) = @_; return abs($centroid - $testnum) <= $tolerance; }
The idea is to iterate over the (unique) centroids - ignoring the very first element in @nlist - and extract all numbers 'in_range' from the original array.

Update: Small bugfix.

-- Hofmator

Replies are listed 'Best First'.
Re^4: Clustering Numbers with Overlapping Members
by monkfan (Curate) on Aug 07, 2006 at 14:35 UTC
    Hofmator,

    One more small thing. Hope you won't mind to look at it. I should add that when the very first element doesn't have its neighbour then it forms another cluster. In other words we ignore the first element only when it has neighbour within tolerance. So for example:
    my @nlist = ( 2,4,5,6,7 ); my $tolerance = 1; We would like to have: A => [2] B => [4 5] C => [4 5 6] D => [5 6 7] E => [6 7]
    How can I modify your code to accomodate this?

    Update: I think I got it.
    #my @nlist = (0,0,1,2,3,3,4,5,6,8,8,10); #my @nlist = (0,1,2,3,4,5,6,8,10); my @nlist = (2,4,5,6,7); my @key_list = ('A'..'Z'); my $tolerance = 1; my %hoa; my %uniq; # Check if first element has a neighbour if ( felem_has_nbr( $nlist[0], $nlist[1],$tolerance ) == 1 ) { @uniq{ @nlist[ 1 .. $#nlist ] } = (); } else { @uniq{ @nlist[ 0 .. $#nlist ] } = (); } for my $centroid ( sort { $a <=> $b } keys %uniq ) { my $key = shift @key_list; $hoa{$key} = [ grep in_range( $centroid, $_ ), @nlist ]; } print "$_ => [@{$hoa{$_}}]\n" for sort keys %hoa; sub in_range { my ( $centroid, $testnum ) = @_; return abs( $centroid - $testnum ) <= $tolerance; } sub felem_has_nbr { my ( $felem, $sec_in_arr, $tl ) = @_; abs( $felem - $sec_in_arr ) <= $tl ? return 1 : return 0; }

    Regards,
    Edward