in reply to How to eliminate redundancy in huge dataset (1,000 - 10,000)

No need to check each and every record against a hash! Just save it directly into a hash. Duplicates get conveniently overwritten. To get the result use the values operator on the hash and you get a nice list of all your unique records.
use strict; my %database; while (my $record = <DATA>) { $database{(split /\|/, $record, 3)[1]} = $record; } print values %database; __DATA__ >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSW +GIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLV +LGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGT +GIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLL +PDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI >gi|49478497|ref|YP_034712.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPS +WGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTL +VLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEK +TRKVFADSTVDFLREYGFDGVDLDWEY
Output:
>gi|49478497|ref|YP_034712.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPS +WGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTL +VLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEK +TRKVFADSTVDFLREYGFDGVDLDWEY >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSW +GIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLV +LGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGT +GIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLL +PDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James