Duplicate entries?

perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Duplicate entries? by toolic (Bishop) on Jan 11, 2012 at 19:32 UTC
`perldoc -q duplicate` [download] How can I remove duplicate elements from a list or array?	[reply] [d/l]
Re^2: Duplicate entries? by bdalzell (Sexton) on Jan 11, 2012 at 23:39 UTC
this subroutine will sort the list of genes if you have put them into an array where each gene (or line containing the gene) is an element of the array and where each duplicated entry is of the same format. `sub findDupsInArray { my @array =(@_); @array = sort{$a cmp $b} @array; my $previtem; foreach my $item (@array){ if($item ne $previtem ) { push (@dups,$previtem); }#if $previtem = $item; }#foreach return @dups; }` [download] Alternatively you can make the whole list a hash where gene name is the key and gene description is the value. then export the hash since the hash will not allow duplicated keys.	[reply] [d/l]
Re: Duplicate entries? by Marshall (Canon) on Jan 11, 2012 at 19:51 UTC
Another way to remove duplicates is to just use the command line sort. Command line sort is not limited to having the entire file memory resident and can sort a HUGE file. Then cycle through that sorted file and don't output lines if the current line matched the immediately preceding line.	[reply]
Re^2: Duplicate entries? by johngg (Canon) on Jan 11, 2012 at 22:08 UTC
If on *nix you can pipe the sort output into uniq (http://en.wikipedia.org/wiki/Uniq) to get rid of adjacent duplicates. `knoppix@Microknoppix:~$ cat rubbish cat fish dog apple cat bird knoppix@Microknoppix:~$ sort rubbish \| uniq apple bird cat dog fish knoppix@Microknoppix:~$` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l]