in reply to Re: clustering pairs
in thread clustering pairs

ID5144.C2039 ID5141.C2039 ID5142.C1221 my ($first, $second) = map {s/.*\.//; $_} split ' ';

Are you deleting the first half of the ID?
While the first column contains only "ID5141"'s the second column does contain unique prefixes that are probably important to the problem.

ID5144.C2039 is different from ID5141.C2039, for example.

Replies are listed 'Best First'.
Re^3: clustering pairs
by jdporter (Paladin) on Dec 01, 2008 at 19:44 UTC

    Looking at what the OP considers to be valid clusters, it appears that only the second part of each ID (C\d+) is considered when determining whether two items are "equal"; the first part (ID\d+) is ignored. Of course, the entire item must be remembered for when it is output again.