Nearest Neighbour Analysis subroutine

This subroutine is for the benefit of any plant ecologists wandering through this monastery. It will calculate the sum of nearest neighbours of samples which are each in discrete sampling points. The coordinates of these points should be put in the @Matrix as qw(x y x y x y etc.). The distance from each sample to all other samples will be calculated, and from this list the shortest one is extracted and added to the sum, which will contain as many elements as there are samples. I use it in conjunction with the Fisher-Yates shuffle from the Perl Cookbook to test the significance of the results with a permutation test.

I submit the subroutine below with some trepidation, since I know it is substandard in many respects, but the me of a few months ago really wanted it to be here (before I was able to do this by myself).

sub NearestNeighbours {
    #call as NearestNeighbours(\@Matrix)
    $sumNN=0; #initialize sum
    my $numberofsamples=scalar(@Matrix)/2;
    my $counter_outer=$numberofsamples;
    my $counter_inner=$numberofsamples;
    my ($xDNM, $yDNM, $xMoves, $yMoves,$i,$j);
OUTER:    for ($i = 1; $i <= $counter_outer; $i++) {
                my $NearestNeighbour=1000000;#initialize to something 
+insanely large
                my $distance = 0; #initialize distance
                  my $c=(2*$i-1);
                 $xDNM=@Matrix[$c-1]; #xDoesNotMove
                 $yDNM=@Matrix[$c];     
        INNER: for ($j = 1; $j <= $counter_inner; $j++) {
                     my $d=(2*$j-1);         
                     if ($c==$d) {next INNER}
                     $xMoves=@Matrix[$d-1];
                     $yMoves=@Matrix[$d];
                     $distance = sqrt(($xDNM-$xMoves)^2+($yDNM-$yMoves
+)^2);
                     if ($distance<$NearestNeighbour) {$NearestNeighbo
+ur=$distance}                                     
                    } # next comparison for this sample
        $sumNN=$sumNN+$NearestNeighbour;
        } #next sample    
    return ($sumNN);
}# end of sub.
[download]

Comment on Nearest Neighbour Analysis subroutine Download Code

Replies are listed 'Best First'.
Re: Nearest Neighbour Analysis subroutine by danger (Priest) on Jan 15, 2001 at 13:59 UTC
Two things I should point out: first, your lead comment suggests that the routine be called with a reference to `@Matrix`, but the routine itself never uses any arguments -- instead it uses the global (or perhaps the file scoped lexical) array of the same name. More problematic than that is that you've used the `^` where you really wanted to use `` operator (assuming your distance is meant to be standard Euclidean distance: square root of sum of squared differences). Here's a quick rewrite (which also does not require some arbitrary large number to initialize the nearest neighbor distance): `sub nneighbours { die "Bad Input Matrix: must be even" if @_ % 2; my @mat = @_; my $sum = 0; my $limit = $#mat; for(my $i = 0; $i < $limit; $i += 2){ my $nearest; for(my $j = 0; $j < $limit; $j += 2){ next if $i == $j; my $dist = sqrt(($mat[$i] - $mat[$j]) 2 + ($mat[$i+1] - $mat[$j+1]) ** 2); $nearest = defined $nearest? ($nearest > $dist ? $dist : $nearest) : $dist; } $sum += $nearest; } return $sum; }` [download]	[reply] [d/l] [select]
Re: Re: Nearest Neighbour Analysis subroutine by Elias (Pilgrim) on Jan 15, 2001 at 22:02 UTC
I thought I had understood your comment, but there is still something I can't figure out. Why does: `my @test=qw(1 2 3 4 5 6); (@test % 2) ? print "uneven\n" : print "even\n"; print "end\n";` [download] Work fine, while the two versions below result in "uneven" in the first case and "even" in the second, when both should be reported as "even". `@test=qw(1 2 3 4 5 6); testthearray1(\@test); sub testthearray1{ (@_ % 2) ? print "uneven\n" : print "even\n"; } testthearray2(); sub testthearray2{ (@_ % 2) ? print "uneven\n" : print "even\n"; }` [download] The BlueCamel says "Any arguments passed to a Perl routine come in as the array @_" (pg 219), and this is clearly what you assume happens when you assign `my @mat = @_` in the second line. When I include something like `print "@_"` to the second line in the `sub testhearray` (1 and 2), it shows up as an undefined array (or rather, ARRAY(0x9b25fd8) in the first, and nothing in the second). I also find it strange that it is the first of the two subs that apparently requires a reference to `@test` instead of `@_`, when this is the one which includes a reference to the array when calling the subroutine. What am I overlooking? (this humbled novice asks).	[reply] [d/l] [select]
Re: Re: Re: Nearest Neighbour Analysis subroutine by danger (Priest) on Jan 15, 2001 at 22:12 UTC
Elias if you pass a reference to an array then `@_` receives that reference -- that is, one element (no matter how big the array is). In `testthearray1(\@test)` you are only passing in one element. In `testthearray2()` you aren't passing in anything at all. The routine I showed should just be called as: `my @array = (1,2,3,4,5,6); my $ndist_sum = nneighbours(@array);` [download]	[reply] [d/l] [select]
Re: Re: Re: Nearest Neighbour Analysis subroutine by merlyn (Sage) on Jan 15, 2001 at 22:09 UTC
`@test=qw(1 2 3 4 5 6); testthearray1(\@test); sub testthearray1{ (@_ % 2) ? print "uneven\n" : print "even\n"; }` [download] You're passing one thing, all the time. That's always odd (and odd!). Remove the `\` in front of `@test`. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]