I am just a beginner in Perl Scripting...... Please help me out in writing a script for this Query.... I have dataset at the bottom of this page....
This data shows some gene sequences of bacteria. Normally our task is to analysis sequences ranging from few 1,000 - 10,000 at a time....Problems: There are number of duplicates in the dataset with the same Id..
To Solve....I need to compile a code which can eliminate the redundancy and preserve on one copy of the gene sequence... ( For ex. 49329899 should occur only once in the dataset...... )
The alphabets present in the 2nd and 3rd line should also be preserved in the dataset.....Below example have two records of “gi|49329899”
Thanks in advance keninData.txt >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49330053|gb|AAT60699.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSWGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLVLGEPWADVTKSYPGSGTTWEDCDK >gi|49478343|ref|YP_037789.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27] MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49329899|gb|AAT60545.1| chitinase [Bacillus thuringiensis serovar +konkukian str. 97-27 MLNKFKFICCTLVIFLLLPLAPFQAQAANNLGSKLLVGYWHNFDNGTGIIKLRDVSPKWDVINVSFGETG GDRSTVEFSPVYGTDAEFKSDISYLKSKGKKVVLSIGGQNGVVLLPDNAAKQRFINSIQSLIDKYGFDGI >gi|49478497|ref|YP_034712.1| chitinase [Bacillus thuringiensis serova +r konkukian str. 97-27] MKSKKFTLLLLSLLLFLPLFLTNFITPNVVLADSQKQDQKIVGYFPSWGIYGRNYQVADIDASKLTHLNY AFADICWNGKHGNPSTHPDNPNKQTWNCKESGVPLQNKEVPNGTLVLGEPWADVTKSYPGSGTTWEDCDK YARCGNFGELKRLKAKYPHLKTIISVGGWTWSNRFSDMAADEKTRKVFADSTVDFLREYGFDGVDLDWEY
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |