use strict; my %obj1found; my %obj2found; while (<DATA>) { if ( / HIT \s+ (\S+ \s+ [\d.]+) \s+ (\S+ \s+ [\d.]+) /x ) { my ( $obj1, $obj2 ) = ( $1, $2 ); $obj1found{ $obj1 }++; $obj2found{ $obj2 } .= " $obj1 " unless ( $obj2found{ $obj2 } =~ / \Q$obj1\E / ); } } my $match_all = join( ' ', sort keys %obj1found ); # note: two spaces between elements print join( "\t\n", "\nList of Obj2 things found in all Obj1's:", grep { $obj2found{$_} =~ /\Q$match_all\E/ } sort keys %obj2found ), "\n"; for my $obj1 ( sort keys %obj1found ) { print join( "\t\n", "\nList of Obj2 things found only in $obj1:", grep { $obj2found{$_} =~ /^ \Q$obj1\E $/ } sort keys %obj2found ), "\n"; } __DATA__ HIT object1 563.43.78 object3 123.89.7777 HIT object1 563.43.78 object10 123.89.7777 HIT object1 563.43.78 object2 453.78.122 HIT object1 563.43.78 object5 457.8888.1 HIT object1 563.43.78 object4 123.89.7777 HIT object1 563.43.78 object6 566.2222.11 HIT object2 563.43.78 object3 123.89.7777 HIT object2 563.43.78 object7 456.222.1111 HIT object2 563.43.78 object8 990.7777.66 HIT object2 563.43.78 object5 457.8888.1 HIT object2 563.43.78 object13 123.89.7777 HIT object2 563.43.78 object9 1223.333.111
This approach would generalize to any number of distinct "Obj1" things. If you have more than two, you might want to look at groupings other than "found in all Obj1 things" and "found only in a single Obj1 thing" -- that's "left as an exercise..."
In reply to Re: How to check for duplicate entries
by graff
in thread How to check for duplicate entries
by Angharad
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |