This subroutine is for the benefit of any plant ecologists wandering through this monastery. It will calculate the sum of nearest neighbours of samples which are each in discrete sampling points. The coordinates of these points should be put in the @Matrix as qw(x y x y x y etc.). The distance from each sample to all other samples will be calculated, and from this list the shortest one is extracted and added to the sum, which will contain as many elements as there are samples. I use it in conjunction with the Fisher-Yates shuffle from the Perl Cookbook to test the significance of the results with a permutation test.

I submit the subroutine below with some trepidation, since I know it is substandard in many respects, but the me of a few months ago really wanted it to be here (before I was able to do this by myself).

sub NearestNeighbours { #call as NearestNeighbours(\@Matrix) $sumNN=0; #initialize sum my $numberofsamples=scalar(@Matrix)/2; my $counter_outer=$numberofsamples; my $counter_inner=$numberofsamples; my ($xDNM, $yDNM, $xMoves, $yMoves,$i,$j); OUTER: for ($i = 1; $i <= $counter_outer; $i++) { my $NearestNeighbour=1000000;#initialize to something +insanely large my $distance = 0; #initialize distance my $c=(2*$i-1); $xDNM=@Matrix[$c-1]; #xDoesNotMove $yDNM=@Matrix[$c]; INNER: for ($j = 1; $j <= $counter_inner; $j++) { my $d=(2*$j-1); if ($c==$d) {next INNER} $xMoves=@Matrix[$d-1]; $yMoves=@Matrix[$d]; $distance = sqrt(($xDNM-$xMoves)^2+($yDNM-$yMoves +)^2); if ($distance<$NearestNeighbour) {$NearestNeighbo +ur=$distance} } # next comparison for this sample $sumNN=$sumNN+$NearestNeighbour; } #next sample return ($sumNN); }# end of sub.

Replies are listed 'Best First'.
Re: Nearest Neighbour Analysis subroutine
by danger (Priest) on Jan 15, 2001 at 13:59 UTC

    Two things I should point out: first, your lead comment suggests that the routine be called with a reference to @Matrix, but the routine itself never uses any arguments -- instead it uses the global (or perhaps the file scoped lexical) array of the same name.

    More problematic than that is that you've used the ^ where you really wanted to use ** operator (assuming your distance is meant to be standard Euclidean distance: square root of sum of squared differences). Here's a quick rewrite (which also does not require some arbitrary large number to initialize the nearest neighbor distance):

    sub nneighbours { die "Bad Input Matrix: must be even" if @_ % 2; my @mat = @_; my $sum = 0; my $limit = $#mat; for(my $i = 0; $i < $limit; $i += 2){ my $nearest; for(my $j = 0; $j < $limit; $j += 2){ next if $i == $j; my $dist = sqrt(($mat[$i] - $mat[$j]) ** 2 + ($mat[$i+1] - $mat[$j+1]) ** 2); $nearest = defined $nearest? ($nearest > $dist ? $dist : $nearest) : $dist; } $sum += $nearest; } return $sum; }
      I thought I had understood your comment, but there is still something I can't figure out. Why does:
      my @test=qw(1 2 3 4 5 6); (@test % 2) ? print "uneven\n" : print "even\n"; print "end\n";

      Work fine, while the two versions below result in "uneven" in the first case and "even" in the second, when both should be reported as "even".

      @test=qw(1 2 3 4 5 6); testthearray1(\@test); sub testthearray1{ (@_ % 2) ? print "uneven\n" : print "even\n"; } testthearray2(); sub testthearray2{ (@_ % 2) ? print "uneven\n" : print "even\n"; }

      The BlueCamel says "Any arguments passed to a Perl routine come in as the array @_" (pg 219), and this is clearly what you assume happens when you assign my @mat = @_ in the second line. When I include something like  print "@_" to the second line in the  sub testhearray (1 and 2), it shows up as an undefined array (or rather, ARRAY(0x9b25fd8) in the first, and nothing in the second).

      I also find it strange that it is the first of the two subs that apparently requires a reference to @test instead of @_, when this is the one which includes a reference to the array when calling the subroutine.

      What am I overlooking? (this humbled novice asks).

        Elias if you pass a reference to an array then @_ receives that reference -- that is, one element (no matter how big the array is). In testthearray1(\@test) you are only passing in one element. In testthearray2() you aren't passing in anything at all.

        The routine I showed should just be called as:

        my @array = (1,2,3,4,5,6); my $ndist_sum = nneighbours(@array);
        @test=qw(1 2 3 4 5 6); testthearray1(\@test); sub testthearray1{ (@_ % 2) ? print "uneven\n" : print "even\n"; }
        You're passing one thing, all the time. That's always odd (and odd!). Remove the \ in front of @test.

        -- Randal L. Schwartz, Perl hacker