select only duplicate entries

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: select only duplicate entries by ikegami (Patriarch) on Aug 24, 2006 at 08:06 UTC
What have you tried? You haven't demonstrated any effort at solving your own problems. (I presume combine duplicate entries was also posted by you.) A hash keyed by protein would be useful. The values would be lists of organs. You can use `split` to seperate the protein from the organ.	[reply] [d/l]
Re: select only duplicate entries by GrandFather (Saint) on Aug 24, 2006 at 08:30 UTC
You may find the answers to combine duplicate entries helpful as a starting point. During building the hash take note of the number of elements in the largest array. Then iterate from 1 to number of elements. In each iteration use grep to pull out a list of the arrays containing the data for the file matching that number of elements. DWIM is Perl's answer to Gödel	[reply]
Re: select only duplicate entries by borisz (Canon) on Aug 24, 2006 at 08:55 UTC
`my %h; while ( defined ( $_ = <DATA> )){ chomp; my ( $k, $v) = split ' '; push @{$h{$k}}, $v; } open my $fh1, '>', '/tmp/1.txt' or die; open my $fh2, '>', '/tmp/2.txt' or die; for my $k ( sort keys %h ) { my $c = @{$h{$k}}; for ( @{$h{$k}}){ $c > 1 ? print $fh2 "$k\t$_\n" : print $fh1 "$k\t$_\n"; }} __DATA__ protein1 stomach protein2 head protein3 muscle protein3 heart protein3 brain protein4 leg protein5 toes protein5 mouth protein6 ear` [download] Boris	[reply] [d/l]
Re^2: select only duplicate entries by ikegami (Patriarch) on Aug 24, 2006 at 15:08 UTC
`while ( defined ( $_ = <DATA> )){` is equivalent to `while ( <DATA> ){` `$c > 1 ? print $fh2 "$k\t$_\n" : print $fh1 "$k\t$_\n";` is equivalent to `print { $c == 1 ? $fh1 : $fh2 } "$k\t$_\n";` or do `my $fh = $c == 1 ? $fh1 : $fh2;` outside the loop and print to `$fh`.	[reply] [d/l] [select]
Re^3: select only duplicate entries by borisz (Canon) on Aug 24, 2006 at 16:17 UTC
Thanks, I know. I try to write it simple for the newbies. Boris	[reply]
Re: select only duplicate entries by Mandrake (Chaplain) on Aug 24, 2006 at 09:50 UTC
Try `#!/usr/bin/perl -w use strict; my %hash; (!/^$/) && (push @{$hash{(split /\s+/,$_)[0]}}, (split /\s+/,$_)[1]) w +hile(<DATA>); open TMP1, '>duplicates.txt' or die; open TMP2, '>distinct.txt' or die; for my $key (keys %hash) { for (@{$hash{$key}}) { (@{$hash{$key}} > 1) ? print TMP1 "$key\t$_\n" : print TMP2 "$key\ +t$_\n" ; } } __DATA__ protein1 stomach protein2 head protein3 muscle protein3 heart protein3 brain protein4 leg protein5 toes protein5 mouth protein6 ear` [download] Please refer to combine duplicate entries for similar solutions. Thanks..	[reply] [d/l]