Re: Identifying duplicates in array or hash based on a subset of data

Store the raw data in an array of arrays. Store the duplicate information in a hash. Combine type and position to form a single key. The value is a reference to an array of indicies into the raw data.

use strict;
use warnings;
my @raw_data;
my %dups;
my $i = -1;
<DATA>; # skip header
while (my $line = <DATA>) {
    my ($id, $type, $pos) = split /\s+/, $line;
    $raw_data[++$i] = [$id, $type, $pos];
    my $key = "$type:$pos";
    $dups{$key} = []  if !exists $dups{$key};
    push @{$dups{$key}}, $i;
}
foreach my $entry (@raw_data) {
    my $key = "$entry->[1]:$entry->[2]";
    print "@$entry\n" if (@{$dups{$key}} > 1);
}
__DATA__
ID   Type Pos
1    1    10
2    1    11
3    1    11
4    1    15
5    2    5
6    2    5


OUTPUT:
2 1 11
3 1 11
5 2 5
6 2 5
[download]

Bill

Comment on Re: Identifying duplicates in array or hash based on a subset of data Download Code