Output:use strict; my %database; while (my $record = <DATA>) { $database{(split /\|/, $record, 3)[1]} = $record; } print values %database; __DATA__ >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSW +GIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLV +LGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGT +GIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLL +PDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI >gi|49478497|ref|YP_034712.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPS +WGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTL +VLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEK +TRKVFADSTVDFLREYGFDGVDLDWEY
>gi|49478497|ref|YP_034712.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPS +WGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTL +VLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEK +TRKVFADSTVDFLREYGFDGVDLDWEY >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSW +GIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLV +LGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase Bacillus thuringiensis serovar + konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGT +GIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLL +PDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTG +IIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLP +DNAAKQRFINSIQSLIDKYGFDGI
CountZero
A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
In reply to Re: How to eliminate redundancy in huge dataset (1,000 - 10,000)
by CountZero
in thread How to eliminate redundancy in huge dataset (1,000 - 10,000)
by Kenin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |