in reply to Identifying duplicates in array or hash based on a subset of data

Store the raw data in an array of arrays. Store the duplicate information in a hash. Combine type and position to form a single key. The value is a reference to an array of indicies into the raw data.
use strict; use warnings; my @raw_data; my %dups; my $i = -1; <DATA>; # skip header while (my $line = <DATA>) { my ($id, $type, $pos) = split /\s+/, $line; $raw_data[++$i] = [$id, $type, $pos]; my $key = "$type:$pos"; $dups{$key} = [] if !exists $dups{$key}; push @{$dups{$key}}, $i; } foreach my $entry (@raw_data) { my $key = "$entry->[1]:$entry->[2]"; print "@$entry\n" if (@{$dups{$key}} > 1); } __DATA__ ID Type Pos 1 1 10 2 1 11 3 1 11 4 1 15 5 2 5 6 2 5 OUTPUT: 2 1 11 3 1 11 5 2 5 6 2 5
Bill
  • Comment on Re: Identifying duplicates in array or hash based on a subset of data
  • Download Code