Re: clustering pairs

This one deconstructs the data array as it finds matches. When the current cluster runs out of matches (inner loop) it starts over with a new cluster (outer loop) until the array is empty.

use strict;
use warnings;

my @data = <DATA>;
chomp @data;

my $cluster = 1;
while (@data){

    my $pair = shift @data;
    print "\ncluster$cluster\n$pair\n";
    my (undef, $id1, undef, $id2) = split /[. ]/, $pair;
    
    my $i = 0;
    while ($i < @data){

        if ($data[$i] =~ /$id1/ || $data[$i] =~ /$id2/){

            $pair = splice @data, $i, 1;
            print "$pair\n";
            (undef, $id1, undef, $id2) = split /[. ]/, $pair;
            $i = 0;
        }else{
            $i++;
        }   
    }
    $cluster++;
}

__DATA__
ID5141.C1665 ID5141.C2448
ID5141.C1253 ID5144.C2039
ID5141.C1596 ID5144.C1956
ID5141.C1906 ID5144.C2149
ID5141.C1221 ID5144.C1956
ID5141.C2149 ID5141.C2386
ID5141.C2039 ID5142.C1221
ID5141.C5887 ID5141.C7685
ID5141.C1005 ID5142.C2808
ID5141.C1046 ID5141.C1596
ID5141.C2386 ID5141.C4990
ID5141.C7685 ID5141.C4888
[download]

Update: Although this works for the given sample query, it will only find 0 or 1 match for any id pair. I was thinking of linked lists, such as chains of sectors in a filesystem.

Comment on Re: clustering pairs Download Code